AD-A279  900 


mu: 


0 


A  fk'amework  for  segmentation  using  physical 
models  of  image  fmmation 


Brace  A.  Maxwell  and  Steven  A.  Shafer 


CMU-RI-TR-93-29 


OTIC 


Robotics  Institute  tfllO  ^ 

Carnegie  Mellon  University 
Pittsburgh.  Pennsylvania  1S213 


ISjSfBC®®' 


10  December  1993 


0 1993  Carnegie  Mellon  University 


^if  im  •wtoiiai  (Miiia  ^ 
plates  t  All  Mid  jrpproAnst* 
4ob8  «1U  la^JklMk  and 


This  lesearch  was  paitiaUy  suppoited  by  the  Avionics  Laboiatory,  Wright  Research  and  Development  Centet;  Aeronauticai  Sys¬ 
tems  Division  (AFSQ,  UA.  Air  Pocce,  Wiight-PanerMO  APB,  OH  4S433-6S43  under  contract  F3361S-90-C-146S,  ARPA  Order 
bfo.  7S97.  The  views  and  condusioas  oontaiaed  in  this  document  are  those  of  the  authors  and  should  not  be  intetpielBd  as  iqne- 
sealing  the  official  policies,  either  expressed  or  implied,  of  the  Uii.  govemmenL 


This  docuaisat  hoi  bass  approved 
ic»  public  t«lsas«  and  sals;  its 
distnbutioo  is  unliaittd. 


94-16557 

IllliUil 


W  6 


008 


REPORT  DOCUMENTATION  PAGE 


form  Approved 
"OMB  f  o.  0704-0  f  98 


S  <«XX1  "9  ouiotn  tot  ir„  :oii»ciio«  ol  t  ntimjii«  to  >  •‘cut  Bt'  ■noo<‘tr.  •orueina  tim»  «w  rrVM.iw> 

I  CSI .no  th(  atx,  «*«««.  tnd  c<y»0<»lift9  tnd  :r*  coliwt.o"  ot  ISiSIJaSi  V,  « 

I  oWlKt.vWi  o'  ntiua.nq  tu^gnuoni  <er  ftCWinq  iB.t  Oura^n  to  VKaVooiO"  "♦*«oa»ftm  oIf»OOf"t»Tof  *nlotm»t5n  oSlfltic.n  Im 

I  0«»it  Hiijh#* jv.  Vj't*  1 204.  otito^too.  -/4  40d  to  tt^oO^KOot  jno  luOqfl.  4Bpfrw0f4  llMucljonfreftet<0204'4)t|#)  Wautit 


'•>9  tiiHinq  041*  VC jftn 

O'  *"*  oth4f  not<t  *(  tnii 
«na  Rooorti.  Hij  j»(fonon 
ton.  iK  20M1 


AGENCY  USE  ONIY  (Ce^ite  bltnn) 


2.  REPORT  DATE 
December  10. 1993 


3.  REPORT  TYPE 

technical 


AND  OATES  COVERED 


1 4.  TITLE  AND  SUBTITLE 

Ia  Framework  for  Segmentation  using  Physical  Models  of  Image 
Formation 


G.  AUTH0«(S) 

Bruce  A.  Maxv/el!  and  Steven  A.  Shafer 


7.  PERFORMING  ORGANIZATION  NAM£(S)  AND  AOORESS(ES) 


The  Robotics  Institute 
Carnegie  Mellon  University 
Pittsburgh.  PA  15213 


S.  FUNDING  NUMBERS 

F33615-90^  : 

Order  No.7 


B.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


CMU-RI-TR-  ■  ?9 


d.  SPONSORING  (MONITORING  AGENCY  NAME(S)  AND  AOORESS(ES) 

10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 

j 

* 

Avionics  Laboratory 

Wright  Research  and  Development  Center,  AFSC, 

1 1.^  AirFonr-/*  Wrioht-Patf^renn  AKH  OH 

i 

i 

1 

11.  SUPPLEMENTARY  NOTES 

"1 

i 

f 

12a.  DISTRIBUTION  AVAILABILITY  STATEMENT 

12b.  DISTRIBUTION  CODE 

« 

Approved  for  public  release; 

Distribution  unlimited 

m 

j 

i 

1 

13.  ABSTRACT  Meximum  JCOwormi 

Mo.st  approaches  to  computer  image  segmentation  group  sets  of  pixels  according  to  visible  features  of  ai  .  ■  i;.',c  such  as 
edges,  color,  brightness,  and  curvature.  Such  approaches  exploit  specialized  object  properties  to  obtain  sai>  v  ictory 


uupings.  which  can  force  those  techniques  to  be  domain  qjecific.  Furthennore.  they  do  not  provide  a  ph  \  s  umI  \ 

explanation  for  the  image,  nor  do  they  group  regions  tfiat  have  a  single  physical  structure  yet  diffeiing  visible  features.  I 

This  paper  presents  a  new  approach  to  segmentation  using  explicit  hypotheses  about  the  physics  that  cre  los  images.  We 
propose  an  initial  segmentation  that  identifies  image  regions  exhibiting  constant  color,  but  possibly  var}'iii<.:  intensity. 

For  each  region,  hypotheses  are  proposed  that  specifically  model  the  illumination,  refMbtance,  and  shape  c '  •  3-D  patch 

which  caused  that  region.  An  image  region  may  have  many  hypotheses  simultaneously,  and  each  hypotlnv  represents  a 
Jisiinct,  plausible  explanation  for  the  color  and  intensity  variation  of  that  patch.  Hypotheses  for  adjacent :  .  oc-s  can  be 
.ompared  for  similarity  and  merged  when  appropriate,  resulting  in  more  global  hypotheses  for  grouping  c!  ■  i  icniary 
regions.  : 

Fhis  approach  to  segmentation  has  the  potential  to  provide  a  list  of  possible  explanations  for  a  given  im;!!'  •;  lo  group 
;ogcther  regions  with  coherent  physical  properties;  and  to  provide  a  f^ramework  f^or  applying  specific  oper;.:  sur h  as 
;hapc-from-shading,  color  constancy,  and  roughness  cvalutation  as  part  of  the  overall  process  of  low-lcvi '  ■  ■  lou 

14.  SUBJECT  TERMS  15.  .NUMBER  OF  PAGES 

16.  PF  CE  CODE 

17.  SECURITY  CLASSIFICATION  I  18.  SECURITY  CLASSIFICATION  I  19.  SECURITY  CLASSIFICATION  30.  LIMITATION  OF  ABSTRACT  1 

OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT  > 

untlmited  unlimited  unlimited  unllniiiil  i 


Abstract 


Itost  ^jfMosdws  to  con^Hiter  image  segmentation  group  sets  of  pixels  according  to  visible  features  of  an  imag^ 
as  ed|^  color,  brightness,  and  curvature.  Such  approaches  ejqploit  qwcialized  object  properties  to  obtain  satisfactory 
groqm^  winch  cm  fiwce  diose  tediniques  to  be  dcmimn  specific.  Furdiermore,  they  do  not  provide  a  physical 
explanation  fix  the  image,  nor  do  they  group  regions  that  have  a  single  physical  structure  yet  diffiering  visible  fea¬ 
tures. 

This  paper  pres^  a  new  approach  to  segmentation  using  explicit  hypotheses  about  the  physics  that  creates  images. 
yffs  propose  an  initial  s^mentation  that  identifies  image  regions  exhibiting  constant  colx,  but  possibly  varying  inten- 
si^.  For  each  region,  hypotheses  are  proposed  that  specifically  model  die  illumination,  reflectance,  and  shape  of  the 
3-D  patch  which  erased  that  region.  An  image  region  may  have  many  hypodieses  simultaneously,  and  each  hypodie- 
sis  represents  a  distinct,  pUusible  e^danadon  fix  the  color  ar»d  intensity  variation  of  that  patch.  Hypodieses  for  adja¬ 
cent  patches  can  be  conqiared  for  similarity  and  merged  when  iqiim^priate,  resulting  in  more  global  hypotheses  fw 
groiqnng  etementaty  regkns. 

This  approach  to  segmentation  has  die  potential  to  provide  a  list  of  possible  explanations  for  a  given  imoy:  to  group 
together  regions  with  coherent  physical  propoties:  and  to  iwovidc  a  firamework  for  applying  specific  operators  such 
as  shape-firom-shading,  colx  constancy,  and  roughness  evaluadon  as  part  of  the  overall  process  of  low-level  vision. 
However,  many  profiound  unsolved  problems  are  raised  in  determining  the  most  “plausible”  explanadons  for  a  given 
image  r^ion.  In  this  pqier,  we  present  the  rqiproach,  working  through  an  example  by  hand,  and  disriiRg  the  implica¬ 
tions  of  tins  i^iproach  fix  physics-based  vision. 


Acceston  For 

) 

NTIS  CRA&I 
OTIC  TAB 

w 

o 

Unannounced 

□ 

Justification _ 

By 

Distribution  f 

Availability  Codes 


Dist 


\M 


Avail  and /Of 
Special 


poft 


1.  Introduction 

The  goal  of  physke-besed  segmentation  is  to  find  image  regions  that  correspond  to  semantic  scene  elements.  In  prac- 
ticai  tenns,  diis  means  finding  one  or  m<»e  physical  desoiptions  of  the  illuminaticHi,  materials,  and  geom^  that  ere- 
tfed  the  image.  lii  this  presentation,  we  focus  upon  the  pr^lem  of  segmenting  a  single  color  image.  That  a  solution 
exists  for  humans  is  obvious:  an  individual  can  locdc  at  a  picture  such  as  Figure  1  and  not  only  comprehend  what  die 
incture  is  dxMit,  but  provkk  a  faiily  detailed  physical  descripdon  of  the  scene.  We  believe  that  postulating  such  a 
phymeal  deso^mi  is  the  key  to  understanding  image  data. 

Eariy  work  in  segmmitation  was  based  upon  straightferward  statistical  models  of  the  image  data  and  did  not  search 
forte  underiying  smnantk  meaning.  Ih^  modeled  images  as  regions  of  uniform  color  and  intensity,  and  variations 
in  these  characteristics  as  noise  [6].  Researchers  realized  that  using  information  about  te  scene  was  important,  but 
they  tried  to  incorporate  such  knowledge  (such  as  trees  are  above  and  beside  a  road)  on  top  of  their  statistical  models 
[411. 

The  statisdeal  ^iptoach  was  taken  partly  because  of  te  (^itimism  of  te  70’s  surrounding  symbolic  reasoning  and 
artifieial  intelligence,  which  rdegated  to  low-level  vision  te  strai^tforward  task  of  dividing  an  image  into  simple 
r^ions  based  upon  oedor  and  brightness,  hfore  extenrive  low-level  processing  was  considered  unnecessary  because  it 
was  assumed  that  programs  using  higher  level  reasoning  would  be  able  to  understand,  identify,  and  merge  diese  sim¬ 
ple  regions  as  rqipropriate  [35]. 

In  te  mid-70’s,  Horn  proposed  using  physical  models  of  image  formalion-te  intnaction  of  light  and  matter-to  ana¬ 
lyze  and  understand  images  [16].  Theoretkally,  using  Horn’s  model  smne  physical  characteristics  of  a  surface, 
inclutfing  shape,  could  be  esthnated  from  a  single  image.  Unfixtunatdy,  Horn’s  model  was  limited  to  perfectly  dif¬ 
fuse,  perfect  reflecting  surfaces  (also  called  Lambertei  surfaces)  and  pmnt  light  sources,  and  assumed  a  single  sur¬ 
face  and  light  source  in  te  scene.  Furthermore,  as  it  did  not  allow  for  noisy  images  or  camoa  limitations~i.e. 
clipiMng  of  te  cokff  values  to  te  camera’s  range-it  was  not  easily  ^licable  to  real  images. 

In  te  mid-80’s,  Shafo-’s  dichromatic  reflection  model  [33]  allowed  researchers  to  begin  looking  at  a  large  class  of 
actual  nutferials:  inhomogeneous  dielectrics.  Inhomogeneous  dielectrics  include  paints,  plastics,  acrylics,  ceramics, 
and  paper.  Klinker  et  of.  [20][21]  demonstrated  te  pov«r  of  this  model,  and  te  physics-based  vision  ajqnoach,  by 
using  it  in  tandem  with  a  model  for  noise  and  camera  effects  to  segment  real  images  of  inhomogeneous  dielectrics. 

DesfMte  te  powo-  of  this  segmentation  program,  it  was  still  ai^licaUe  to  a  limited  class  of  images.  Metals  or  multi¬ 
colored  objects  could  not  be  correctly  segmented.  Furdiamore,  te  assunqitions  of  Klinker  et  al.  included  a  single 
cokir  of  illumination.  This  resulted  in  incorrect  segmentations  in  regions  with  colored  interreflection  from  nearby 
objects. 

Rncfing  solutions  for  these  limitations  was  te  next  stqi  in  physics-based  vision.  Bajesy  et  al.  [2]  attempted  to  model 
imerreflectum  and  improve  te  parameter  estimation  methods  of  Klinker  et  aL  by  using  hue,  saturation,  and  intensity. 
Brill  [7]  proposed  a  slightly  different  model  for  inhomogeneous  dielectrics  and  demonstrated  its  use  in  segmentation. 
Heal^  [14]  prrqxised  te  unichromatic  reflection  model  for  metals,  and  showed  that  it  could  be  used  wife  the  dichro¬ 
matic  reflection  model  to  segment  images  widi  both  metals  and  inhomogeneous  dielectrics  under  specific  lighting 
conditKMis. 

As  a  resuh  of  diese  efforts,  te  vision  community  could  claim  it  could  segment  images  containing  two  materials-inh- 
omogeneous  dielectrics  and  metals— and  images  containing  interreflection,  but  both  methods  had  limitations,  lb  cat- 
reedy  model  interreflection,  fm  exanqile,  a  white  reference  plate  was  necessaiy  in  order  to  negate  the  effects  of  die 
global  illomination.  Furthermore,  there  are  still  a  large  number  of  materials  and  lighting  conditions  that  cannot  be 
handled  by  these  models  and  their  variations.  More  comprehensive  reflection  models,  and  models  fn'  different  types 
of  materi^  are  being  researched,  but  no  general  reflection  model  yet  exists  (e.g.  see  [39],  [29],  [31],  [8],  and  [IS]). 
Up  to  te  present,  physics-based  segmentation  routines  for  single  colm  images  have  been  based  upon  one,  or  at  most 
two,  qiecific  models  of  reflection  with  a  set  number  of  parameters.  Furthermore,  the  issue  of  differing  types  of  illumi- 


t 


/  f 


F^iuc  1.  (Plate  1)  A  coapla  focM  conpoMd  dTinnMroat  materialB,  textnres,  aod  shapes. 


F%Bre  2.  (Plate  2)  An  ob|ect,  a  mkror  imafe  of  the  object,  and  a  picture  of  the  object 

natioa  has  not  beea  examined,  and  all  of  die  major  woilc  in  segmentation  has  assumed  uniformly  colored  objects. 

In  puaUel  with  this  work,  the  computer  vision  community  has  looked  into  determining  light  source  color  [24],  and 
coatinuedtoworkoadetermiiiing8hq)e,althoaghmosdy  withrangedata(e.g.  see[II]  [26]  [27]).  Unlike  the  work  in 
segmeatadon,  s^uch  assumes  all  of  the  objects  in  an  imi^  conform  to  the  same  model,  in  die  area  of  sh^  recovoy 
model  selection  as  well  as  parameter  esdmadon  is  being  used.  Large  families  of  models  are  inidally  considered  for  a 
set  of  data,  and  die  best  model  is  selected,  as  well  as  the  best  estimation  of  its  parametms. 

Recently,  Breton  et  oL  [5]  have  combined  siupe,  light  source  direction,  and  material  consistency  into  a  single  seg¬ 
mentation  roittiiie.  They  initially  propose  a  fondly  of  models  for  light  source  direction  and  shape,  but  they  assume  a 
un^  modd— Lambertian-for  the  reflectance  properties  of  the  material. 

Unfortunately,  none  of  diese  systen»  can  deal  with  a  pictures  such  as  Figure  1.  It  contains  grey  and  colored  metals 
r^ecting  miUti-colored  illumination,  and  numerous  dielectrics  with  differing  reflectance  properties.  In  order  to  begin 
to  understand  general  images  such  as  this,  the  next  logical  st^  is  to  begin  loddng  at  die  families  of  possible  models 
for  all  duee  elemaMs  of  a  scene-illumination,  reflectance,  arid  shape  or  geometry.  The  need  for  a  general  model  of 
illundnation  is  ^iparent  from  the  metal  tet^  on  die  right  side  of  Rgure  1.  Unless  we  can  explicidy  model  the  illumi¬ 
nation  from  all  directions  with  respect  to  the  surface  of  dw  teapot,  we  cannot  undmstand  that  the  color  variation  is  due- 
to  both  die  material  type  (copper),  and  the  illumination.  A  comprehensive  reflectance  model,  or  at  least  a  specification 
of  die  qmce  of  possitte  mo^ls,  is  also  necessary  in  order  to  segment  general  images,  as  shown  by  the  previous  dis¬ 
cussion. 


Figure  3.  (Plate  3)  lunge  of  unifonniy  colored 
inhomogeneous  dielectrics. 


Figure  4.  (Plate  4)  Image  of  a  sin^e  multi-ctdored 
object 


In  the  past  researchers  have  approached  the  analysis  of  such  images  by  postulating  particular  model  equations,  and 
instantiating  their  parameters,  with  discontinuities  in  the  parameters  taken  as  segmentation  boundaries.  Instead,  we 
propose  that  the  very  forms  of  the  models  are  to  be  instantiated  in  order  to  accommodate  qualitatively  different 
shapes,  materials,  colors,  and  illumination  environments.  In  this,  we  are  moving  the  analysis  from  the  primitive  level 
1  model  of  Rissanen  [32]-estimating  parameters  of  a  previously  established  model-to  a  level  3  analysis-selecting 
the  model  class-with  a  resultant  increase  in  perceptual  powo'. 

From  the  above  summary  of  work  in  physics-based  segmentation,  it  is  clear  that  model  selection  has  only  recently 
been  examined  by  Breton  et  aL,  and  only  for  illumination  and  shape.  Model  selection  is  necessary  because  of  ambi¬ 
guity  in  an  image.  As  can  be  seen  in  Figure  2,  there  can  be  several  different  physical  explanations  for  identical  image 
regions.  But  what  ate  the  genml  models  we  should  use?  Vfhat  are  the  parameters  of  the  model  classes  we  need  to 
consider,  and  do  we  need  to  consider  them  all?  If  not,  how  do  we  choose  an  initial  set  of  models,  and  how  do  they 
merge  and  interact? 

These  are  the  questions  we  deal  with  in  this  paper.  In  section  2  we  present  a  general  model,  showing  all  of  the  possi¬ 
ble  parameters  for  the  space  of  model  classes.  In  section  3  we  suggest  a  method  for  choosing  a  subset  of  the  possible 
models  with  which  to  begin  segmenting  an  image.  Finally,  in  section  4  we  propose  a  method  for  merging  and  analyz¬ 
ing  the  different  model  hypotheses  to  obtain  a  global  segiiwntation. 

Using  this  method  allows  both  multiple  explanations  for  the  same  image  region,  and  grouping  together  of  regions  that 
display  coherence  in  one  or  more  of  the  elements  of  their  physical  explanation.  For  example,  physics-based  vision 
can  segment  images  such  as  Figure  3  [21].  The  discontinuities  in  color  in  Figure  4,  however,  cause  current  methods  to 
fail  for  this  common  image.  Only  by  using  more  general  models  for  segmentation  can  the  image  region  correspond¬ 
ing  to  the  entire  cup  be  proposed  as  a  single  semantic  entity. 

2.  A  General  Model  of  Image  Formation 

Images  are  formed  when  light  strikes  an  object  and  reflects  towards  an  imaging  device  such  as  a  camera  or  an  eye. 
The  color  and  brightness  of  a  point  in  an  image  is  the  result  of  the  color  and  intensity  of  the  incident  light,  and  the 
shape  and  optical  properties  of  the  object.  This  section  presents  a  new  formal  model  of  these  elements,  how  they 
intnact,  and  how  they  are  related  to  what  we  see  in  an  image. 


2.1.  The  Elements  of  a  Scene 


The  elements  constituting  our  model  of  a  scene  are  surfaces,  illumination,  and  the  light  transfer  function  or  reflec- 


pat** 


tance  of  a  point  in  3-D  space.  These  elements  can  be  thought  of  as  the  intrinsic  characteristics  of  a  scene,  as  opposed 
to  image  features  such  as  edges  or  regions  of  constant  color  [36] .  We  begin  by  providing  a  formal  notation  for  each  of 
these  elements. 


2.1.1.  SlufMCS 

Wb  model  objects  in  the  real  world  using  2-D  manifolds  we  call  suifaces.  On  a  given  surface,  we  can  define  local 
coordinales  as  a  two-variable  parameterization  (u,  v)  relative  to  an  i^itrary  origin.  The  shape  of  the  manifold  in  3- 
D  sptot  is  specified  by  a  siu^e  embedding  function  S(i<,  v)  ->  (x, y,z) ,  defined  over  an  extent  £c  («.  v) .  The 
sutfece  embedding  function  mi^  a  pdnt  in  the  local  coordinates  of  the  manifold  to  a  point  in  3-D  global  coordinates. 
This  global  cotmlinate  system  is  also  anchored  to  an  arbitrary  migin,  often  specified  relative  to  an  imaging  device.  As 
shown  in  Figure  5,  the  surface  embedding  allows  us  to  define  a  tangent  plane  7[u,  v)  and  surface  normal  N(u,  v)  at 
each  point  on  the  manifold,  and  therriqr  to  define  a  local  3-D  cocmiinate  ^stem  at  each  surface  point  with  two  axes 
on  the  tangent  plane  and  mie  in  the  direction  of  die  surface  normal.  Other  useful  properties,  such  as  curvature,  can 
also  be  defined  and  qiecified  for  each  pdnt  using  the  surface  embedding  function.  Throughout  this  {nesentation  we 
use  wire-fiame  diagrams,  such  as  Rgure  5  to  show  the  sh^ie  of  a  surface  patch. 

b  is  inqiortant  to  note  dutt  we  do  not  view  the  world  as  cmisisting  of  surfaces  to  be  found,  but  as  objects  to  be  mod- 
ded.  b  is  commonly  presumed  in  machine  vision  that  ‘'surfaces”  exist  in  nature,  and  diat  the  job  of  tte  vision  system 
is  to  discover  them.  We  reject  that  view,  believing  instead  diat  surfaces  are  artifacts  of  the  interpretation  process  and 
exist  only  within  the  peicqttual  system  diat  is  attenqdng  to  iniild  a  model  of  the  world.  Given  this  view,  there  is  no 
“correcf*  surface  widi  which  to  modd  an  object,  bistead,  the  choice  of  which  manifold  and  surface  embedding  func¬ 
tion  will  be  used  to  rqiresent  a  given  object  is  made  by  die  modeler,  and  dqiends  largely  upon  the  task  and  informa¬ 
tion  at  hand.  CHven  a  brick  wall,  for  exanqile,  if  the  application  is  obstacle  avoidance,  a  single  plane  could  be  chosen 
to  modd  die  mitire  wall.  Fbr  other  situations,  such  as  segmentation,  it  might  be  necessary  to  model  each  Inick  as  well 
as  die  troughs  b^ween  them.  At  an  even  smaller  scde,  undostanding  the  image  texture  in  detail  may  require  a  model 
of  each  bump  on  each  brick  in  order  to  interpret  the  wail.  AO  are  potentidly  useful  “surfaces”  to  mo^l  die  same  wall, 
and  all  might  be  needed  at  various  points  in  the  visud  process.  Thus  one  diject  in  die  worid  can  be  modeled  by  rtuuiy 
different  surfaces,  and  the  chdce  of  model,  or  surfece,  is  macte  by  the  interpretn'.  This  view  allows  us  to  conceive  of 
a  perccptud  process  that  incorporates  numerous  differing  surfaces  to  describe  an  object,  an  important  capability  that 
odier  conqmtationd  vision  systems,  which  seek  for  a  single  “correct”  surface,  lack. 

In  order  to  parameterize  ii^t  striking  and  reflecting  from  a  surface,  we  also  need  to  define  a  parameterization  of 
direction.  In  the  global  coordinate  system  we  use  two  angles  (9^  6^ ,  where  0^  specifies  the  angle  between  the 
direction  vector  and  die  x-axis,  and  6^  corresponds  to  the  angle  between  the  direction  vector  and  the  y-axis.  To  spec-' 
ify  directions  in  the  local  coordinate  systems,  we  will  use  normal  spherical  coordinates,  as  shown  in  Figure  6,  ^>eci- 
fi^  by  die  ordered  pair  (0,  q>)  .  0  is  the  polar  angle,  defined  as  the  angle  between  the  surface  normal  and  die 
direction,  and  9  is  the  azimuth,  defined  as  ^  angle  between  a  perpendicular  projection  of  the  ray  onto  the  tangent 
pfame  and  a  reference  line  on  die  surface  (usually  defined  to  be  either  die  u  or  v  axis). 


papiS 


2.1^  niundiuitfcm 

Mudi  nseuch  in  machine  vision  assunms  a  single  light  source,  often  a  relatively  large  distance  away  from  the  scene 
bdng  imaged.  However,  many  visual  phenomena  arise  because  of  reflection  from  nearby  objects  acting  as  additional 
li^  sources.  The  field  of  cmnputer  graphics  has  long  incorponded  this  idea  into  systems  such  as  ray  tracing  and  radi- 
osity.  In  the  field  oi  madune  vision,  inteneflection  has  been  ^udied  between  two  objects,  but  still  no  general  nradel 
exists  fior  qrecifying  the  totality  of  illumination  on  a  surface  point. 

To  bq^  examining  generai  images  we  can’t  assume  point  lighting,  three  independent  light  sources,  or  other  con¬ 
structed  illumination  seh^t.  A  general  model  must  allow  uS  to  specify  any  type  of  illumination,  including  interreflec¬ 
tion  from  other  d:^ts,  and  still  have  identifiable  subsets  thd  fit  with  our  traditional  concqMions  of  illumination.  We 
develop  our  modd  by  first  defining  and  qiecifying  the  parameters  of  a  single  ray  of  light,  then  extending  this  model  to 
the  des^be  the  light  arriving  at  a  point 

A  photon  is  a  quantum  of  light  energy  that  moves  in  a  single  direction  unless  something— like  matter,  or  a  strong  grav¬ 
ity  field— affects  its  motion.  Thanks  to  die  sun  and  artificial  light  sources,  there  are  many  photons  moving  in  many 
directions  at  any  given  time.  Gillections  of  photons  moving  in  die  same  direction  at  the  same  place  and  time  consti¬ 
tute  rays  of  light  As  photons  move,  th^  oscillate  about  their  direction  of  travel  at  a  spectrum  of  wavelengths  X  which 
qiecify  the  distance  traveled  in  a  single  oscillation.  The  hunum  eye  is  sensitive  to  photons  with  wavelengths  that  fall 
between  qtproximately  380  and  760nm,  and  die  spectral  distribution  of  wavelengdis  present  in  a  collection  of  pho¬ 
tons  determine  whm  cdor  we  see.  A  charge-coupled  device  [CQ[>]  camera  responds  to  a  slightly  different  range  of 
wavdengths,  and  infrared  color  filters  are  normdly  used  to  qproximately  match  die  color  response  of  the  human 
eye.  The  polarization  of  a  population  of  photons  specifies  their  oscillation  and  (mentation  widi  respect  to  the  direc¬ 
tion  of  travd,  and  it  can  af^  die  maimer  of  reflection  and  transmission  when  li^t  interacts  with  matter.  Polarization 
is  commonly  represented  using  a  set  of  parameters,  such  as  the  Stokes  parameters  [4],  which  we  indicate  by  the  vari¬ 
able  r  €  { 1, 2, 3, 4}  that  indexes  the  Stokes  parameters  to  specify  the  relative  energy  of  photons  oscillating  at  dif¬ 
ferent  orientations. 

In  a  scene,  light  is  being  emitted  or  reflected  in  numerous  directions,  entering  and  leaving  points  throughout  the  area 
of  interest  Using  the  parameters  described  above,  a  single  ray  of  light  at  time  t  at  position  (x,  y,  z) ,  moving  in  direc- 
tion  (6y  op,  of  frequency  X  and  polarization  s,  can  be  specified  by  die  8-tuple  (x,  y,  z,  0^  0^  X,  r,  r) . 

For  the  purposes  of  image  formation,  we  want  to  specify  die  intensity  of  visible  light  that  is  incident  from  all  direc¬ 
tions  on  points  (x,  y,  z)  in  global  3-D  coordinates.  We  can  describe  the  light  energy  arriving  at  a  point  from  all  direc¬ 
tions  Ity  the  incident  light  energy  field  function  L^lx,  y,  z,  0^  0^  X,  s,  t),  which  specifies  the  radiant  intensity,  or 
radiance  per  unit  solid  angle,  of  light  incoming  to  the  point  (x;y,z)  from  direction  (9^  9  )  of  wavelength  X  and 
Stokes  parameter  s  at  time  r.  This  function  is  similar  to  dw  plenoptic  fimction  defined  in  [1],  (m*  the  hellos  function 
[28].  In  diis  paper  we  consider  only  single  pictures  taken  at  time  r,  making  time  a  constant  and  allowing  us  to  drop  it 
fimn  our  parameterizaticm  of  illumination  functions.  As  a  result,  we  consider  only  the  subspace  of  the  incident  light 
miergy  field  L*(x,  y,  z,  9^^,  9^^  X,  r). 

For  a  point  in  five  qiace,  we  note  that  rays  arriving  at  that  point  can  be  mapped  onto  a  sphere  of  unit  radius  [9].  In  this 
manner,  the  incident  light  on  a  surface  point  can  be  visualized  on  the  unit  sphere.  The  brightness  and  color  of  a  point 
(9^  9p  on  die  sphere  indicates  the  brightness  and  color  of  the  incident  light  from  that  direction.  We  define  this  rep¬ 
resentation  of  the  light  energy  field  on  the  unit  sphoe  for  a  3-D  point  (x,  y,  z)  to  be  the  global  illumination  environ¬ 
ment  [CHE]  for  that  point  It  is  impc^tant  to  note  that  on  cqiaque  surfaces  some  of  the  incident  light  is  blocked  by  the 
object  matter  itself,  limiting  the  illumination  environment  to  the  hemisphere  above  the  tangent  plane.  If  the  surface  is 
transparent  the  illumination  environment  will  be  the  complete  sphere,  as  light  can  be  incident  on  the  surface  point 
firom  below  as  well  as  above.  We  can  visualize  the  illumination  environment  for  opaque  surfaces  by  orthogonally  pro¬ 
jecting  it  onto  a  plane  as  in  Hgure  7.  To  give  some  simple  exanqiles,  several  common  illumination  environments  can 
be  visualized  as  in  Rgure  8,  Figure  9,  and  Figure  10.  A  simple  example  of  what  such  illumination  environments 
mi^  look  like  is  shown  in  die  inset  im^  beside  each  figure. 

If  we  substitute  die  local  surfece  c(x>rdinates  (u,  v)  for  the  global  coordinates  (x,y,z),  and  the  local  spherical  coordi- 


pages 


Fi|iire  7.  (Mhofooal  nuppiBg  of  the 
fflomiiiatioa  CBviioimiciit  ooto  a  phuie. 


Flfare  9.  fflate  6)  Blue  ambient  l^lht  a 
wUte  dmlar  sonice  to  the  risht  and  bdiind. 


FIgnie  8.  (Plate  5)  Dtomination  environment 
for  inset  image:  orthogonal  mapping  of  a 
white  light  source  directly  overhead. 


Figure  10.  (Plate  7)  Grey  ambient  light  with 
red  light  reflected  off  another  object. 


nates  (0>9)  fortheglobalaxisangles,  we  obtain  the  (ocu/mcu/en/iigAt  eRergyj;e((/L'''(H,v,  0,9,  X,  5),  which  also 
cm  be  visualized  on  a  hemisphere  above  the  tangent  plane  to  the  local  surface  point  for  opaque  surfaces.  This  repre¬ 
sentation  we  call  the  Uxai  illumination  environment  [LIE]  fw  the  surface  point  (u,  v) .  Note  that  the  global  and  local 
illumination  functions  are  distinguished  by  dieir  parametos. 


The  total  radiance  of  a  patch  of  the  illumination  environment  hemisphere  with  polarization  specification  s  at  wave- 
lengdi  X,  specified  by  ^  angles  (6, 9)  and  subtending  dO  and  d9  is  given  by  L*(u,  v,  0, 9,  s,  X)sin6d0d9dX 
[16].  The  total  irradiance  at  a  point  («,  v)  is  given  by  (1).  The  sine  term  is  part  of  the  solid  angle  specification,  and 
the  cosine  term  reflects  the  foreshortening  effect  as  seen  by  the  surface  point. 


ic 
K  2 


E  =  J  Jl'^(u,  v,6, 9,s,X)cos6sin9d0d9d!X 


(1) 


2.13.  ReOectance  and  the  Light  TVansfer  Function 

In  ordo’  for  a  point  on  a  surface  to  be  visible  to  an  imaging  system,  there  must  be  some  emission  of  light  fiom  that 
point  As  with  the  incident  light  energy  field,  we  are  interested  in  describing  the  li^t  energy  diat  is  leaving  a  surface 
point  (x,y,  z)  in  every  direction  (9^9p  in  polarization  stater  for  every  wavelength  X.  The  light  leaving  a  point  is' 
specified  by  the  eidtant  light  energy  fieia  L'{x,  y,  z,  6^,  9^  r,  X).  This  function  has  the  same  par^eterization  as  the 
incident  li^t  enogy  field,  and  describes  an  intensity  for  every  direction  and  wavelength.  As  with  the  incident  light 
energy  field,  vre  can  define  a  local  coordinate  vmion  of  the  exitant  light  energy  field  L'{u,  v,  9, 9,  s,  X) . 


pagt? 


flfBic  11.  Some  Special  Cases  of  tfie  Light  'nransfer  Functloo:  Fluorescence,  Potarfamthm,  Ihmsmittance,  and 

Specular  or  Surface  Reflecthm 


The  lelationdiip  between  the  incident  and  exitant  light  enngy  fields  depends  upon  the  macroscopic,  microscopic,  and 
atomic  characteristics  of  die  given  point  the  light  strikes.  It  is  die  gross  characteristics  of  this  relationship  that  allow 
us  to  identify  and  describe  surfaces  in  a  scene.  Formally,  the  incident  and  exitant  light  eneigy  fields  are  related  by  the 
reflectance,  w  globed  light  transfer  fimcdon  9{(j(,y,z;6^,8^.s''',  X''';0*^6\s',X*;r)  which  indicates  the  exitant 
light  energy  field  L'{x,  y,  z,  6^  0^  s,  X)  produced  by  one  unit  of  incident  light  from  direction  (0]^,  0'*') ,  of  polariza¬ 
tion  s* ,  and  wavelength  X*  for  a  particular  surface  point  (jc,  y,  z)  at  time  t.  To  allow  us  to  drop  time  ^om  the  param- 
etetizatkMi,  we  assume  surfaces  whose  transfer  functions  do  not  change.  An  alternative  form  of  the  light  transfer 
function  can  be  obtained  by  substituting  the  local  coordinates  (u,  v,  0,  (p)  for  the  global  parameters  (x,  y,  z.  0^  0  ) 
resulting  in  the  loced  tight  tranter  function  9t(u,  v;0'*',  tp'*',  s*,  X''’;0*,  <p‘,  s',  X‘).  ^ 

The  relationship  between  the  incident  light  eneigy,  the  exitant  light  eneigy,  and  the  transfer  function  can  be  written 
using  local  coordinates  as  the  integral  in  (2).  This  integral  says  that  the  exitant  light  energy  field  is  the  sum  of  the  self¬ 
luminance  of  the  point,  Lq,  and  the  product  of  the  transfer  function  and  the  incident  light  energy  field  integrated  over 
the  parameters  of  die  incident  light  The  cosine  term  is  due  to  foreshortening,  and  the  sine  term  from  the  solid  angle 
qie^cation.  The  result  of  this  integral  is  a  function  of  the  exitant  light  variables. 

x  X 

L*(ii,v;. =  L^(u,  v;. ..-...) +  5^  J  J  IL'ICH,  v, ...+... )9l(«,v;... cos sin 0'^</0*</(p'^dX'^  (2) 

**X*-xO 

A  structured  analysis  of  the  transfer  function  shows  how  it  subsumes  several  common  special  cases,  sketched  in  Fig¬ 
ure  1 1 .  We  give  a  brief  description  of  the  parameter  constraints  that  correspond  to  these  special  cases:  fluorescence, 
polarization,  transmittance,  and  surface  or  specular  reflection.  These  descriptions  demonstrate  the  framework  pro¬ 
vided  by  the  genoal  transfo*  function. 

•  For  a  non-fluorescing  surface,  if  the  incident  light  is  of  wavelength  then  the  exitant  light  energy  field 

will  also  have  wavelengdi  Aq,  and  no  other  wavelengths  will  be  present  If,  on  the  other  hand,  the  same 
incident  light  strikes  a  fluorescent  surface,  there  may  be  other  wavelengths  present  in  the  exitant  light 
eneigy  field.  In  terms  of  the  parameters  of  the  transfer  function,  fluorescence  implies  there  exists  some 
pair  of  wavelengths  where  X'  *  X'*’  for  which  91  >  0. 

•  Polarizing  transfer  functions  modify  the  polarization  of  the  incoming  light  This  effect  can  be  seen  in  sun¬ 
glasses,  which  often  block  the  horizontal  polarization  mode.  For  non-polarizing  surfaces,  91  =  0  whenever 
s*  *  s' .  For  a  polarizing  transfer  function,  there  exists  some  pair  of  stokes  parameters  {s^,s')  where 
s'  *  s*  for  which  91  >  0. 

•  Transmitting  surfaces  allow  some  light  to  pass  through  them.  Conversely,  an  opaque  surface  limits  both 


FIgiiic  12.  (Plate  8)  Dlostnitioii  <rftlie  transfer 
ftinctioD  for  a  slightly  rough  metal  object 


Figure  13.  (Plate  9)  Dlostratlon  of  the  tranrfer 
fhnctton  for  a  slightly  rough  plastic  object 


the  incident  and  exitant  light  energy  fields  to  a  hemisphere  above  the  tangent  plane  for  that  surface.  Trans¬ 
mittance  occurs  when  either  the  exitant  or  incident  light  energy  field  bounds  (0*,tp')  and  (6'*',  are 
extended  beyond  the  hemisphm  above  the  tangent  plane  of  die  surface,  implying  that  at  least  some  of  the 
exitant  or  incident  light  energy  is  passing  through  the  material  In  terms  of  the  parameters,  a  surface  is 
transmitting  if  91  >  0  when  0*  >  90®  or  91  >  0  when  O’*"  >  90®. 

•  Specular  reflection,  described  in  more  detail  in  Section  3.2.4.,  occurs  when  the  incident  light  is  only 
reflected  about  the  local  surface  normal  in  the  perfect  specular  direction.  This  restriction  implies  that  the 
transfer  function  is  zero  except  when  9*  =  9'*'  n  and  O'  =  O’** .  It  is  important  to  note  that  surface 
reflection  is  relative  to  the  local  surface  normal,  and  it  is  possible  to  have  an  optically  rough  surface  where 
the  local  surface  normals  vary  relative  to  the  overall  surface  [3][38]. 

•  Finally,  Lambertian  surfaces-also  called  perfectly  diffusing  perfect  reflectors-reflect  incident  light 
equally  in  all  directions.  For  a  unit  energy  ray  of  light  from  direction  (0, 9) ,  the  exitant  light  energy  in  all 
directions  is  specified  by  the  expression  cosO. 

To  illustrate  a  transfer  function,  we  show  a  sphere  with  that  transfer  function  in  the  environment  shown  in  Figitre  12 
arxl  Rgure  13.  The  sphere  sits  above  a  matte  black  and  white  checkered  surface  under  a  dark  grey  sky  with  a  white 
point  light  source  shining  on  it  from  above  and  to  the  right  of  the  viewer.  Because  all  illumination  is  of  uniform  spec¬ 
trum  (i.e.  grey),  any  color  in  the  image  is  due  to  the  transfer  function.  The  checkerboard  pattern  is  [nesent  to  highlight 
the  specularity  of  the  object  Figure  12  is  an  illustration  of  a  highly  specular  material  with  no  body  reflection,  and  Rg- 
ure  13  shows  a  matte  colored  material  with  a  small  amount  of  surface  reflection. 


General  Hypotheses  of  Physical  Appearance 


We  have  defined  a  3-D  world  model  for  individual  points  and  their  optical  properties,  but  how  does  a  whole  surface 
qrpear  in  a  digitized  computer  image?  To  describe  a  surface  and  its  appearance,  we  introduce  a  nomenclature  for  the 
aggregation  of  appearance  pn^rerties  in  the  3-D  world  and  how  these  aggregations  map  to  an  image. 

We  have  defined  surfaces  with  an  extent  an  embedding,  und  we  have  defined  a  transfer  function  91  over  a  surface.  The 
combination  of  a  surface  and  a  transfer  function  we  define  to  be  a  surface  patch.  Because  the  transfer  function  can 
vary  arbitrarily,  there  are  no  constraints  on  the  appearance  of  a  general  surface  patch  in  an  image.  Frequently,  how¬ 
ever,  the  transfer  function  at  nearby  points  on  a  surface  displays  some  type  of  identifiable  coherence.  Coherence  does 
not  imply  uniformity,  and  covers  a  broad  scope  of  possible  aggregations  such  as  uniformity,  repetitive  patterns,  or 
irregular  textures.  Some  {Hoperties  that  commonly  impart  coherence  include  material  type,  color,  roughness,  and  the 
index  of  refraction.  We  can  model  the  coherence  of  the  object’s  appearance  with  a  surface  patch  whose  transfer  func¬ 
tion  is  similarly  coherent. 


Appearance  Patch  Hypothesis  Region 

Figure  14.  Mapfring  fkom  an  oppearanee  patch  to  a  hypothesis  region. 


A  surface  patch  with  a  coherent  transfer  function,  however,  will  not  always  display  the  coherence  in  an  image.  Differ¬ 
ing  illumination  ova  the  surface  patch  or  occluding  objects  can  mask  or  modify  the  appearance  of  the  patch  to  an 
imaging  system.  For  the  purposes  of  image  analysis,  we  would  like  to  specify  not  only  coherence  in  the  transfer  func¬ 
tion,  but  cohoence  in  die  exitant  light  energy  field,  which  is  what  is  viewed  by  the  imaging  device.  To  achieve  coher¬ 
ence  in  the  exitant  light  energy  field,  we  must  add  to  the  surface/transfer  function  pair  a  coherent  illumination 
environment  ovct  the  surface  patch.  This  combination  we  define  as  an  appearance  patch:  a  surface  patch  whose 
points  exhibit  &  coherent  transfer  function  and  illumination  environment,  ai^  whose  exitant  light  energy  field  exhibits 
a  crdietence  related  to  that  of  the  transfer  function  over  the  entire  patch,  and  which  is  not  occluded  from  the  imaging 
system. 

Given  an  appearance  patch,  we  can  imagine  that  the  exitant  light  energy  field  over  the  patch  maps  to  a  set  of  pixels  in 
the  image.  As  sketched  in  Figure  14,  the  exitant  light  fiom  a  surface  caught  by  the  imaging  ^vice  determines  the 
color  and  position  of  the  set  of  pixels  related  to  that  surface.  The  fdiysical  explanation  for  a  given  exitant  light  energy 
field  fiom  a  given  patch  we  define  to  be  a  hypothesis  H  =  {S,  E,  L* ).  Ihe  four  elements  of  a  hypothesis  are  tlw 
surface  embedding  S,  the  surface  extent  E,  the  transfer  function  91,  and  the  incident  light  energy  field  L^.  With  these 
functions,  it  is  possible  to  cooqiletely  detemune  tiie  exitant  light  energy  field  (assuming  no  self-luminance).  The  basic 
coimection  between  a  physical  explanation  and  a  group  of  image  pixels  is  provided  by  a  hypothesis  region 
HR  =  {P,  H),  defined  as  a  set  of  pixels  P  that  are  the  image  of  the  hypothesis  H.  The  combination  of  the  hypothesis 
elements  represents  an  explanation  for  the  color  and  brighmess  of  every  pixel  in  the  image  patch.  For  simplicity,  we 
assume  the  image  is  finmed  by  a  pinhole  camera  at  the  origin  looking  at  the  canonical  view  volume.  To  represent  the 
fact  that  a  single  region  may  have  more  than  one  possible  explanation,'  we  define  a  hypothesis  set 
HS  =  (P,  rf|, ...,  H^)  to  be  a  set  of  pixels  P  with  an  associated  list  of  hypotheses  rfj, ...,  where  each  hypothe¬ 
sis  Hi  {Hovidn  a  unique  explanation  for  all  of  the  pixels  in  P,  and  only  the  pixels  in  P. 


Hnally,  given  a  set  of  hypothesis  sets  for  pixel  regions  P,-,  we  define  a  segmentation  of  the  pixel  set 

P  =  ly  Pj  to  be  a  set  of  hypotheses,  containing  one  hypothesis  fiom  each  HSi,  that  explains  the  values  of  the  pixels 

in  P.  Of  course,  to  be  physically  realizable,  these  hypotheses  must  be  mutually  consistent  The  goal  of  low-level 
vision,  in  terms  of  our  vocabulary,  is  to  produce  one  or  more  segmentations  of  the  entire  image. 


To  illustrate  a  hypothesis,  we  combine  the  representations  developed  previously  into  a  3-panel  image  displaying  the 
characteristics  of  S,  L,  and  9{.  Returning  to  the  image  of  the  cup  in  Figure  4,  we  can  examine  a  single  uniformly  col¬ 
ored  region  (shown  at  the  far  left  of  Figure  IS)  and  visualize  two  hypotheses  for  it:  a  mirror  reflecting  some  illumina¬ 
tion  environment  or  a  plastic  object  under  white  illumination.  We  can  illustrate  the  metal  hypothesis  in  Figure  IS, 
and  the  plastic  hypothesis  in  Hgure  16.  Both  hypotheses  describe  the  same  image  region,  and  the  combination  of  the 
two  forms  a  hypothesis  set 


3.  Fundamental  Hypothesis  Regions 


The  difficulfy  inherent  in  segmentation  using  physical  descriptions  lies  in  determining  the  correct  mapping  between 


pag*iO 


FIfuc  15.  (Plate  10)  lUastnitioD  of  a  metal  hypodiesis:  (a)  actnal  icgloB  (Arom  F^pu*  4),  (b)  wire  frame 
aorftKe  rq^rescBtatkm  (pbmar),  (c)  Ohimteatlon  environment  (dillkise),  (d)  transfer  ftmdlon  (metal). 


FlfBre  Iti.  (Plate  11)  Dhistration  at  a  dklectrfc  hypodwsls:  (a)  actual  region  (fkom  FIgnre  4),  (b)  wire  firame 
saitece  representation  (planar),  (c)  flhnninathm  environment  (diAise),  (d)  transfer  ftmetion  (dielectric). 

the  image  and  die  scene  that  created  diein.  The  |xobIan  is  that  for  a  single  pixel  in  an  image,  there  are  an  infi¬ 

nite  number  of  physical  exjrianations  fnr  its  color,  ai^  in  isolation,  it  is  not  posable  to  distinguish  between  those 
explanations.  A  single  ted  pixel  can  be  a  ted  object  under  white  light,  a  white  object  under  ted  light,  a  mirror  reflect¬ 
ing  ted  li^ht,  a  minor  reflecting  a  ted  object,  or  numerous  odwr  possibilities,  and  it  is  impossible  to  discriminate 
between  diem  given  only  the  one  {Mxel  value.  Fortunately,  we  ate  not  analyzing  pixels  in  isoladon,  but  images,  which 
represent  collections  of  appearance  patches  fiom  the  real  worid.  These  ^ipearance  patches  possess  cohoence  in  their 
transfer  function  and  their  illumination  environment  The  segmentation  process  is  thus  the  act  of  itfentifying  which 
sets  of  pixels  correspond  to  which  tqqiearance  patclms,  identifying  the  possible  physical  explanations  for  diose 
patdies,  and  then  meiging  them  with  oti^  i^qiearance  patches  when  their  possible  jriiysical  explanations  ate  compat- 
iUe  in  some  identifiable  fashion. 

Such  a  conc^  fmr  segmentation  is  not  new— for  example,  Klinker  et  aL  [20]  and  Healey  [14]  both  identified  regions 
of  nmilarity  of  some  physical  properties.  What  is  new  in  this  presentation  is  the  generality.  These  past  worics  assumed 
diat  the  scene  ob^ed  certain  properties  and  looked  only  for  a  single,  narrowly  defined  kind  of  coherence.  In  our  new 
iqiproach,  the  goi^  illumiiution  and  transfer  functions  allow  us  to  represent  reason  about  and  discoYCT  many  dif- 
fermt  kinds  of  coherence  in  a  single  image.  This  capability  is  necessary  for  the  analysis  of  natural  or  common  man¬ 
made  scenes  such  as  Rgute  1. 


3.1.  Pixel  Classificatioii 


The  first  stqi  in  segmentation  is  to  identify  pixel  regions  that  display  coherence  in  some  feature  space.  In  a  coIcm' 
image,  the  most  obvious  duuracteristic  linking  together  groups  of  pixels  is  their  color.  The  smallest  such  groiqiings 
are  aggragates  of  pixels  with  identical  color.  A  reasonable  starting  assumption  might  be  that  a  set  of  connected  pixels 
widi  the  same  color  conespond  to  a  single  ^ipearance  patch  within  a  scene.  We  believe,  howevw,  that  using  regions 
of  nnifbnn  color  overlooks  much  of  the  information  contained  in  the  image. 


Figure  17.  (Plate  12)  Mug  divided  into  ideeifeed  nniform  chromaticity  regions. 

3.1.1.  Uniform  Chnnnatidty  Regions 

An  q)iNtoacb  of  sU^tly  greatm'  complexity  is  to  groiq>  together  pixels  displaying  the  same  basic  color  ratios,  (»  chro¬ 
maticity,  but  with  varying  brightness.  Ma^matically,  chmmaticity  is  defin^  by  “normalized  color”  coordinates,  as 
d^ned  in  (3)  [19].  Chromaticity  can  also  be  thought  of  as  the  hue  and  saturation  of  a  color  without  die  intensity 
informadon. 


(r,g,b)  =  ( 


r _ £ _ b 

r+g  +  i’  r+g  +  h’  r+g  +  b 


) 


(3) 


We  define  a  unSform  chmmaticity  ngion  [UCR]  to  be  a  connected  set  of  pixels  that  possess  uniform  chromaticity  and 
possibly  varying  brighmess.  A  UCR  corresponds  to  a  linear  clusto’,  as  defined  by  Klinkm*  et  n/.[21].  As  such,  a  more 
general  definition  of  a  UCR  is  a  connected  set  of  pixels  whose  covariance  matrix  in  color  space  has  a  single  non-zoo 
eigenvalue,  whose  eigenvector  is  related  to  die  chromaticity  of  the  region.  Because  it  allows  for  varying  brightness 
widiin  a  region,  a  UCR  is  able  to  capture  more  of  the  relevant  cohoence  between  neighboring  pixels  than  simple  uni¬ 
form  regions. 


Klinker  er  aL  [20]  note  that  a  UCR,  or  linear  cluster,  can  represent  two  distinct  objects  if  both  ate  dark  or  poorly  illu¬ 
minated.  In  d^  segmentation  method,  however,  we  initidly  assume  that  a  UCR  represents  a  single  surface  patch 
under  a  sin^e  illumination  environment  This  requites  a  form  of  coherence  from  die  physical  elmnents  genoating  die 
UCR.  Clearly,  it  is  possible  to  construct  an  image  with  UCRs  that  do  not  have  such  coherence  in  the  physical  world, 
and  we  realize  that  our  current  iqiproach  will  not  ctmecdy  handle  such  situations. 

The  benefit  daived  by  using  UCRs  is  that  diey  are  groupings  of  pixels  diat  we  can  reasonably  assume  to  correspond 
to  a  single- appearance  patch  in  the  physical  world,  setting  constraints  on  the  associated  hypotheses.  These  constraints 
are  that  over  the  patch  the  transfer  functions  ate  ctdment  and  the  illumination  environments  are  similar.  Because  it  is 
a  single  rqipearance  patch,  it  is,  by  definition,  a  single  surface.  Figure  17  shows  an  idealization  of  die  cup  image 
divided  into  UCRs. 


By  identifying  UCRs  in  the  image,  we  have  taken  the  first  step  in  the  segmentation  process  by  linking  pixels  with 
qqieatance  patches  in  die  scene.  The  next  stqi  is  to  begin  to  identify  the  relevant  physical  explanations,  or  hypothe¬ 
ses,  for  the  tqipearance  patches  core^ionding  to  the  identified  UCRs. 


3.2.  Generating  Hypotheses 


If  a  UCR  does  correspond  to  a  single  appearance  patch  in  the  scene,  what  are  the  possible  hypotheses  for  that  i^pear- 
ance  patch  given  the  constraints  identified  previously?  Qearly,  the  relationship  between  appearance  patches  and 
hypodieses  is  not  oim-to-one.  As  demonstrated  by  Hgure  2,  it  is  possible  for  identical  image  regions  to  have  differing 


page  12 


hypodieses  ^edfyiag  physical  descriptkm.  Therefme.  given  a  UCR  and  its  related  appearance  patch,  we  must 
consider  multiide  physical  desct4itions. 

The  first  questkm  we  examine  is  how  many  physical  descriptions  must  be  considmd?  We  begin  to  answo-  this  ques¬ 
tions  by  noting  that  a  UCR  has  two  characttristics  that  make  it  interesting:  it  is  not  necessarily  white,  and  it  is  not  nec¬ 
essarily  uniform  intensity.  Any  hypmhesis  that  explains  a  UCR  has  to  explain  what  element  or  elements  are  causing 
the  color  and  the  brightnw  variation? 

The  possible  soinces  of  ccdor  for  an  ^)pearance  patch  are  die  illumination,  the  transfer  function,  or  both.  Intuitively, 
die  sinqdest  hypodieses  attribute  the  color  to  a  single  element  of  the  hypothesis.  As  an  example,  consider  a  UCR  of 
uniform  pixel  values.  A  sinqile  hypothesis  is  one  that  qiecifies  the  sutfoce  as  red  plastic  under  diffuse  white  illumina¬ 
tion.  Such  a  hypothesis  is  intuidvely  plausiUe,  and  simple  to  express.  A  more  conqikx  hypothesis,  attributing  the 
odor  to  two  dements,  is  one  where  both  the  illumination  and  the  color  of  die  object  vary  over  the  surface,  but  in  such 
a  way  that  dieir  combination  produces  the  same  color  and  intensity  at  each  point.  This  hypothesis  is  much  more  diffi- 
cuh  to  express,  and  is  not  automatically  accepted  by  our  intuition  as  a  plausible  explanation. 

The  varying  intensity  of  a  UCR  could  be  due  to  uneven  illumination,  uneven  coloring,  or  curvature  of  the  surface. 
Any  or  all  of  these  possibilities  could  occur  on  a  single  patch.  Again,  Intuitively  some  of  these  explanations  are  sim¬ 
pler  than  others.  Attributing  all  brightness  variation  to  the  shape,  for  example,  is  the  underlying  assumption  for  many 
shiqie  from  shading  algoridmis. 

Rrom  diese  observations,  we  can  bepn  to  answer  die  question  ‘how  many  hypotheses  must  Wk.  c.  rider’*  by  lodcing 
at  die  simplest  ones  first  Using  simplicity,  or  plausibility  to  select  between  alternative  hypodieses  nas  been  suggested 
by  Tympanum  et  al.,  and  has  been  used  as  the  basis  for  several  vision  systems  [23][1 1][22][27][26].  This  requires  us 
to  diitringtiish  between  simple  and  complex  hypotheses.  Furtlmmore,  we  need  to  look  for  simplicity  not  only  within 
the  hypmheses,  but  in  the  rqxesentations  of  the  elonents  themselves.  But  what  constitute  simple  forms  of  the 
hypothesis  elements,  and  must  every  possibility  by  entertdned?  Furthermore,  does  simplicity  always  imply  a  hypoth¬ 
esis  is  moR  likdy,  or  more  plausible!  Tb  answtt  these  questions,  we  must  delve  into  die  meaning  of  what  constitute 
classes  of  die  hypodiesis  elements  5.  L,  and  91,  and  what  we  mean  by  the  terms  “plausibility,”  “complexity,”  or 
‘^veirdness”  wifo  respect  to  a  hypodiesis  an  its  elements. 

3,2.1.  Plaiisibility 

hi  an  ideal  world,  we  vrould  be  able  to  quantity  complexity,  or  “weirdness”  and  use  it  as  the  basis  for  generating  and 
rank-ordering  the  possiUe  hypotheses  for  a  given  region.  The  weirdness  of  a  hypothesis  might  be  represented  by 
three  axes  indicating  the  complexity  of  the  shape,  transfer  function,  and  illumination  environment.  Less  weird  expla¬ 
nations  would  be  those  closer  to  the  origin  of  the  three  axes.  The  further  from  the  origin,  the  weirder  the  hypodiesis 
elements  would  become.  By  generating  hypotheses  close  to  the  migin,  or  with  only  one  weird  element,  we  could 
b^in  widi  a  small  set  of  simple  hypodieses  and  generate  weirder  ones  only  if  necessary.  Weirdness  is  a  difficult  con¬ 
cept  to  measure,  however,  and  die  axes  of  our  weirdness  scale  are  almost  certainly  non-linear  and  not  indqiendent 

The  minimum  description  length  [MDL]  principle,  howevor,  is  a  mathematical  fmnalism  for  “weirdness.”  The  MDL 
principle  says  dut,  given  a  parameterization  for  describing  a  model,  the  best  model  that  describes  a  set  of  data  is  the 
one  that  can  be  encoded  in  fewest  numbCT  of  binary  digits,  m  shortest  length.  In  ctnnputer  vision,  the  MDL  principle 
has  been  used  siKcessfully  by  Leclerc  [23],  Darrell  et  al.  [11],  Krumm  [22]  and  Leonardis  [26].  If  we  postulate  a  lan¬ 
guage  fat  S,E,L,  and  91,  tiien  a  hypothesis  region  and  its  fields  and  subfields  are  a  model  described  in  that  language. 
Our  task  In  segmentation  is  to  fi^  a  set  of  such  models  that  describe  an  entire  image,  or  data  set.  Based  upon  the 
MDL  i»inciple,  we  propose  that  the  most  desirable  sets  of  hypotheses  that  describe  a  particular  scene  are  the  least 
comptex  ones,  or  the  ones  that  can  be  described  most  swxinctly. 

It  is  important  to  nme  that  the  description  length  has  two  components:  the  complexity  of  the  description,  and  how 
well  tiiat  descrqition  fits  the  data.  The  combination  of  the  two  components  is  used  to  select  the  best  model.  When  we 
are  dealing  widi  a  set  of  hypotheses  frnr  an  image  region,  they  ought  to  fit  the  data  about  equally  well,  so  that  term  of 


P«g»l3 


the  description  length  shouhl  be  approxunately  constant  Tbeiefoie,  nuik-oidering  the  hypotheses  for  a  region  using 
some  measure  of  complexity,  shoi^  be  sufficient  to  satisfy  the  MDL  criteria. 

Because  there  are  an  infinite  number  of  hypotheses  for  any  UCR,  care  must  be  taken  in  the  initial  selection  of  the 
hypodwsis  set  for  each  UCR.  One  important  consideration  of  the  MDL  principle  is  that  the  optimal  model,  or  model 
set  must  be  among  those  tested  for  shortest  length.  Following  our  methodology,  we  want  to  identify  subspaces  of  our 
genenl  parameterization  which  will  are  both  simple  and  likely  to  occur  in  general  images.  There  are  at  least  three 
qiproaches  that  could  be  taken  to  generate  diis  model  set: 

•  Genoate  a  large  number  of  possible  hypotheses  and  test 

•  Generate  incrementally  according  to  some  search  critoion 

•  Generate  a  small,  but  comprdiensive  set,  using  broad  classes  of  the  hypothesis  elements;  expand  this  set 
incrementally  if  all  of  its  constituents  are  ruled  out  as  possibilities 

As  indicated  by  previous  discussion,  the  first  ai^Moach  seems  pointless  and  intractable.  Breton  et  aL  were  able  to  use 
die  approach  by  creating  a  discrete  mesh  of  possible  light  source  directions  for  a  “virtual”  point  source,  but  since  our 
modd  has  many  more  parameters  in  both  die  illuminadon  environment  and  the  transfer  fimcdon,  such  coverage  by  a 
discrete  mesh  is  intractaUe.  The  second  ^iproach  has  merit,  but  it  is  unclear  what  type  of  search  critnion  is  needed 
for  this  task.  Instead,  we  pn^xise  the  use  of  Isoad  classes  to  initially  assign  hypotheses  to  a  UCR,  with  the  under¬ 
standing  diat  the  particular  details  of  a  hypothesis— i.e.,  the  actual  sh^,  the  specific  colors,  surface  roughness,  and 
odier  characteristics-will  be  determined  at  a  later  point  in  the  segmentation  process.  It  is  also  impntant  to  note  that 
this  set  can  be  incrementally  expanded  if  all  of  its  initial  constituents  are  considered  unlikely.  The  broad  classes, 
which  we  derive  firom  the  general  model  for  scene  description,  are  simple,  yet  comprehensive  enough  to  cover  a  wide 
range  of  possible  environments  and  objects. 

3.2,2.  Ikxtmomy  of  Surfoces 

Surfaces  can  be  described  at  many  levels  of  complexity.  A  cube,  for  example,  can  be  modeled  as  a  set  of  planar 
patches,  a  polyhedron,  or  a  supetquadrk.  As  noted  previously,  when  modeling  objects  in  the  real  worid,  surfoces  can 
take  on  any  amount  of  comptexity,  dqiending  upon  the  needs  of  die  modeler.  When  reasoning  about  hypotheses,  what 
we  are  most  interested  in  is  how  the  surfaces  of  adjacent  hypothesis  regions  are  related.  When  they  riiow  similar  qual¬ 
ities,  it  is  reasonable  to  conrider  moging  the  two  regions. 

To  simplify  diis  reasoning  process,  we  initially  consider  only  two  classes  of  surfaces:  curved  and  planar.  These  two 
classes  provide  a  simple  distinction  diat  can  be  used  to  reason  about  merging  hypotheses.  A  finer  distinction  would 
require  a  specific  met^  for  modeling  curved  surfaces,  which  we  leave  for  future  explwation.  When  a  surface  repre¬ 
sentation  method  is  determined  for  the  actual  segmentation  system,  reasoning  about  merging  two  curved  surfaces 
could  be  done  based  on  that  rq)iesentation-e.g.  matching  two  spheres,  supeiquadrics,  generalized  cylinders,  or  poly- 
rKHnial  surfoces. 

3.23.  Tiixonomy  DlimiiiiatkMi 

There  are  several  simpli^  qrecial  forms  of  the  incident  light  energy  field  function  that  represent  useful  models  of 
illumination.  Recall  that  the  general  form  of  the  gl<^  incident  light  function  is  given  by  L*ix,  y,  z,  0^.  Qy,  Ks,t). 
Figure  18  shows  the  relationslups  of  the  subspaces  we  idmitify  for  this  function.  The  largest  subspace  we  consider  is 
that  of  time-invariant  illumination,  where  we  consider  time  to  be  a  constant  and  drop  it  from  our  parameterization. 
The  second  subspace  we  higMight  is  unpolarized  time-invariant  illumination  L*(x,  y,  z,  9^  6^,  X).  For  most  images 
of  intoest,  all  of  the  illumination  in  a  scene  will  fall  into  this  category.  Scenes  with  illumination  outside  this  subspace 
are  rare,  and  would  be  tiiose  illuminated  by  a  polarized  light  source  such  as  a  laser,  or  by  a  time- varying  source  (over 
the  course  of  the  image  ciqiture  process),  ^thin  the  unpolarized,  time-invariant  subspace  are  those  illumination  fonc- 
tions  in  which  the  color  of  the  light  is  indqiendent  of  the  direction  of  incidence.  The  hue  and  saturation  of  such  illu- 


pat*  14 


Flgiire  18.  SabqMMCs  of  the  global  inddoit  light  cnersy  6cld 


Flgnre  19.  CPbite  13)  Dlfhise  Figure  26.  (Plate  14)  Uniform  Figure  21.  (Plate  15)  General 

IBaminatlon  eneironmoiL  Ufauninadon  environment.  Olnminatlon  environmoit 


mination  functions  are  the  same  in  all  directions  and  only  the  Inightness  varies  over  the  illununation  hmnisphere. 
These  illumination  fimctions  are  sqiaraUe  into  the  form  L*(x,  y,  z,  6^  6^C(x,  y,  z,  X),  where  L*^  y,  z,  6^  0J 
denotes  the  incoming  intensity  in  a  given  direction  at  (j;y,z),  and  C{x,  y,  z,  X)  die  color  of  the  illumination.  Withm  m 
subepace  of  sqiatable  functions  is  tiie  un^orm  illumination  subspace  which  can  be  written  for  the  point  (x,y.z)  as 
0JC(X),  where  L*(!6^  ~  { It  0}  .  Uniform  illumination  tiius  implies  that  all  illumination  in  the  environ¬ 

ment  nas  me  same  color.  Some  important  special  cases  of  uniform  lighting  include; 


Point  light  source  at  (6^,  6^ 

^  ^  V.0  otherwise 

Rnite  disk  source  of  apex  angle  a 
centered  at  {^gQ,^y(^ 

e  J  “8»eb«tween  {6^,  Op  and  (0^,  0^)  < a 

^  vP  otherwise 

Perfectly  diffuse  "ambient” 
illumination 

0  J  =  1  for  all  0^  and  0^.  Thus,  L*  is  trivial  and  the 
illummralOT  is  fully  characterized  by  C(X)  at  (x,  y,  z) . 

These  three  simple  cases  play  an  impcwtant  role  in  modeling  illumination.  Indeed,  as  shown  by  the  computer  graphics 
community,  a  large  number  of  illumination  environments  can  be  modeled  using  one  or  more  point,  finite  disk,  or 
ambient  light  sources  [12].  Fot  the  purpose  of  reasoning  about  hypotheses,  we  use  three  subspaces~in  (nda-  of 
increasing  complexity-diffuse,  uniform,  and  general  illumination  to  describe  the  forms  of  the  illumination  environ¬ 
ment  A  diffuse  illumination  environment  uniform  color  and  brightness  over  the  hemisphere,  is  shown  in  Hgure  19, 
along  with  its  effect  on  a  white  sphere.  Rgure  20  illustrates  a  uniform  illumination  environment  as  specified  in  Hg- 
ure  18,  and  its  effect  also  on  a  white  sphoe.  Finally,  a  general  illumination  environment  is  illustrated  in  Figure  21, 
along  with  its  effect  on  a  metal  sphoe. 


pat* IS 


3.24.  IkHNHHny  the  Thmsfo*  Function 

Numetous  omninon  cases  of  die  transfer  functioo  arise  wben  we  consider  the  subset  of  non-polarizing,  opaque,  and 
noa-fluoiescing  surfisces.  At  present,  we  consider  only  surfaces  that  fall  into  this  class.  For  non-polarizing  nutterial, 
die  polarizidion  parameters  are  separable  and,  since  we  ate  only  considering  unpolarized  incident  li^t,  can  be 
removed  from  the  overall  function.  For  non-fluorescent  materials,  T^Oif  X'*'  ^  X'  ,  allowing  the  wavelength  ptnun- 
eters  to  be  combiiied  into  a  single  parameter  X.  For  opaque  materials,  the  directions  of  incident  and  exitant  li^t 
energy  are  limited  to  the  hemiqihete  above  die  tangent  plane  fw  the  surface  point  (u,  v) .  With  these  restrictions,  the 
transfer  fiinction  becomes  V,  6'*',  p'*',  6*,  p'i  X),  n^iete  0  <  0  <  90°. 

This  reduced  transfer  function  still  includes  surfoces  with  arbitrary  changes  in  die  transfer  hmcdon  over  («,v)  .Such 
siBlace  patches  can  have  differing  coIot  and  texture  within  dieir  extent  Therefore,  we  further  identify  two  nested  sub¬ 
sets:  transfer  functions  that  are  piecewise-uniform,  and  those  that  are  cmnpl^ly  uniform  ovct  the  extent  of  die 
(u,  v)  parameters.  The  subset  of  uniform  transfer  functions,  shown  in  Figure  22,  can  be  qiecified  by  the  reduced 
function  9((9*,  6',  p',  X),  as  it  is  constant  over  all  relevant  values  of  u  and  v.  This  fwm  of  the  transfer  function  is 

recognizable  as  die  weO-known  spectral  bi-directional  reflectance  distribution  function  [qiectral  BRDF]  fiw  a  uni¬ 
form  surface  [30]. 

Within  this  set  are  further  interesting  subspaces  of  the  transfer  function.  ’Bansfa'  functions  widi  surface  reflection  or 
body  ejection  are  two  inqiortant  overlqiinng  sub^iaces.  Their  relationship  within  die  BRDF  and  the  interaction  of 
die  union  of  these  subq;mces  is  shown  in  Rgure  22.  Surface  reflection,  as  noted  previously  takes  place  at  the  interfece 
between  an  object  and  the  surioonding  air.  The  direction  of  the  exitant  light  energy  is  governed  by  the  surface  normal 
at  the  ptnnt  of  reflection;  it  is  reflected  through  die  local  surface  normal  in  the  “perfect  specular  direction.”  The 
amount  of  li^  reflected  is  determined  by  Fresnel’s  laws,  whose  parameters  include  the  angles  of  incidence  and  emit- 
tance,  the  index  of  refraction  of  die  material,  and  die  polarization  of  the  incoming  light  For  white  metals  and  most 
man-made  dielectrics  the  surface  reflection  can  be  considered  constant  over  die  visible  spectrum  [17][18].  Materials 
whose  surfece  flection  is  rpproxiiiutely  constant  over  the  visible  spectrum  form  a  useful  subset  and  are  said  to  have 
neutral  bttetface  reflection  (NIR)  [25].  The  surface  reflection  frcnn  an  NIR  material  is  assumed  to  be  the  same  cokx' 
as  the  illumirudion.  Cmntnon  materials  for  which  the  surface  reflectimi  is  more  dependent  upon  wavelengdi  include 
“red  metab”  such  as  gold,  copper,  and  bronze,  all  of  which  modify  the  color  of  die  reflected  surface  illumination  [14]. 

Many  materials  diqdaying  surface  reflection  are  optically  “rough.”  They  possess  microscopic  surface  with  local  sur- 
ftee  normals  that  difte  from  the  macroscopic  shtpe,  as  shown  in  Hgure  23.  A  subset  of  these  rough  surfaces  are 
those  widi  rou^iness  characteristics— such  as  microsapk  slopes  or  heights-that  have  a  Gaussian  distribution.  Sev¬ 
eral  reflection  models,  such  as  Torrance-Sparrow  and  Beckmann-Spizzochino,  have  been  develtqied  for  rough  sur- 


Surface 
Reflection 


Figure  23.  Mki'ofccet  surface  refiection  modd. 


Body 

Reflection 


Flsure  24.  Body  reflection  msdd:  tnmqiarcct 
medinm  with  pigment  partides. 


faces  using  a  Gaussian  distribution  assumption  for  some  surface  characteristic  [3][10J[13][25][29][34][381.  Thcss 
models  fit  into  our  taxonomy  of  transfer  functions  as  shown  in  Figure  22. 


A  more  complex  form  of  reflection,  body  reflection,  takes  place  when  penetrates  a  surface  and  interacts  with 
colorant  particles  as  shown  in  figure  24.  During  this  interaction,  some  of  die  wavelengths  may  be  absorbed,  coloring 
the  reflection.  The  remaining  wavelengths  are  re-emitted  in  random  directitnis,  striking  other  colorant  particles,  and 
some  ultimately  exiting  the  surface  as  body  reflection.  Surfaces  whose  colorant  particles  le-emit  equally  all  wave¬ 
lengths  of  visible  light  form  the  “white”  suiiset  of  transfer  functions  with  body  n^ection.  Because  of  the  stochastic 
nature  of  this  reflection,  a  cmnmon  assumption  is  that  the  body  leflectioa  is  indqiendent  of  viewing  directum.  The 
subset  of  transfer  functions  that  di^lay  dus  indqpendmce  in  their  botfy  reflection  are  called  Lambertian  surfaces. 
Hiese  subset  relationships  are  shown  in  Figure  22.  The  body  reflection  of  the  Lambertian  subset  is  said  to  obey  Lam¬ 
bert’s  Law,  which  states  that  the  reflection  is  dqiendent  upon  the  incommg  light’s  intensity  and  cosine  of  the  angle  of 
incidence  [16].  ImiHoved  models  of  body  reflection  are  b^g  researched  [15][31][39]. 


Many  interesting  and  useful  transfer  functions  exhiltit  both  body  and  surface  reflection.  Common  materials  simulta¬ 
neously  displaying  these  types  of  reflection  include  pla^c,  paint,  glass,  ink,  paper,  cloth,  and  ceranac,  most  of  which 
can  be  moiled  with  the  NIR  assumption  [30][3‘n<  Transfer  functions  within  this  overlapping  mgion  have  been 
approximated  by  the  dichmmatic  ruction  model  [33]. 


Metals  also  fall  into  the  spectral  BRDF  category,  althou^  they  only  di^lay  surface  reflection  and  have  been  mod¬ 
eled  by  the  unichromatic  ruction  model  [14].  Most  models  for  rough  specular  surfaces  ^iply  directly  to  metals. 

For  the  purposes  of  our  proposed  segmentation  method,  we  initially  consider  objects  whose  transfer  functions  fail 
within  the  union  of  body  reflection  and  surface  reflection.  (Ejects  witii  these  properties  naturally  divide  into  two  cat¬ 
egories;  metals  and  dielectrics.  Metals,  as  noted  previously  display  only  surface  reflection;  dielecirics  always  have 
some  body  reflection,  and  often  di^flay  surface  reflection  as  well,  although  not  as  strongly  as  metals.  Ulustrations  of 
these  two  classes  of  the  transfer  function  can  be  seen  in  Rgure  12  and  Fgure  13. 


33,  Hypotliesis  Classification 


Based  on  the  above  taxonomies  of  S,  L'^,  and  91,  we  now  identify  a  simple,  yet  comprehensive  set  of  hypotheses  for 
explaining  the  color  and  brighttiess  variation  of  a  UCR.  To  accomplish  this  task,  we  first  form  a  set  of  hypothesis 
classes  based  upon  the  forms  previously  developed  for  the  individual  hypothesis  elements.  The  l»oad  classes  foreacli 
element  are: 

•  Surfaces  s  planar,  curved 

•  Illumination  Environment  a  diffuse,  uniform,  general  function 


Transfer  Function  ^  metal,  dielectric 


(a)  (b)  (c) 

Fignre  26.  (Pbte  17)  FUndamaital  hjpothcrii  with  OlniiiiBatloB  as  color  source:  (a)  surfece,  (b)  ilhuninatloii 

cnvtronnieiit,  (c)  transfer  Ibnctloii. 

The  possible  combinations  of  these  broad  classes  create  a  set  of  twelve  simple  hypothesis  forms  for  an  appearance 
patch  corresponding  to  a  UCR. 

3JI.1.  Fundamental  Hypotheses 

To  account  for  the  distribution  of  color  between  the  elements  of  a  given  form,  we  identify  a  set  of  hypotheses  tisat  is 
simple,  and  yet  provides  a  significant  amount  of  explanatory  power.  This  set,  which  we  call  the  set  offimdamental 
hypotheses,  consists  of  those  hypotheses  in  which  the  color  of  the  region  is  due  to  only  one  of  the  possible  color  pro¬ 
ducing  elements:  the  body  reflection,  the  surface  reflection,  or  the  illumination  environment  Figure  25  and  Figure  26 
illustrate  two  of  the  fundamental  hypodieses  for  a  region  of  the  cup.  In  Figure  25,  we  see  that  the  curved  plastic  is  col¬ 
ored  and  the  illumination  is  vriiite.  Figure  26  shows  the  illumination  as  the  color  source  with  v/hite  plastic.  Both  are 
equally  possible  explanations  for  the  UCR. 

Combining  the  broad  classes  for  hypothesis  elements  witii  the  requirement  that  the  color  of  a  pixel  is  due  to  either  the 
transfer  function  or  the  illumination  environment,  but  not  both,  creates  a  finite  set  of  hypotheses  that  must  be  consid¬ 
ered  for  an  UCR.  Given  two  material  types,  two  shape  classes,  three  illumination  environments,  and  three  possible 
color  sources,  we  arrive  at  36  possible  hypotheses.  As  the  body  reflection  cannot  be  the  cola'  source  if  time  is  no 
body  reflection  (metals),  there  are  at  most  30  fundamental  hypotheses  that  explain  the  same  UCR  for  non-polarized, 
opaque,  non-fluorescent  surfaces.  Note  this  is  true  for  all  UCRs,  no  matter  the  shape,  color,  or  brightness  distribution. 

Closer  analysis  of  these  30  fundamental  hypotheses  shows  that  the  six  hypotheses  corresponding  to  dielectrics  wliose 
color  source  is  the  surface  reflection  are  highly  unlikely,  and  probably  do  not  conform  to  the  single  color  source  rule. 
These  six  hypotheses  are  unlikely  because  of  the  commonly  used  Neutral  InterfMX  Reflectio;r  assumption,  which 
states  that  tlie  spectrum  of  surface  reflection  of  a  dielectric  is  approximately  uniform,  or  neutral  in  terms  of  its  effect 
on  the  color  of  the  reflection  [25].  This  assumption  is  based  upon  the  observation  that  one  of  the  manufacturing  crite¬ 
ria  for  the  medium  of  many  common  dielectrics  is  that  it  have  a  neutral,  or  uniform  spectrum.  This  criterion  eni;ures 
that  the  coloring  will  be  imparted  entirely  by  the  pigment  materials  added  during  the  manufacturing  process.  As  most 


pisstlS 


Each  leaf  contains  two  hypotheses,  one  with 

a  cuncd  surface,  one  with  a  planar  surface.  CS:  Surface  Reflection 


Diffuse 


Uniform 


General 


Metals 


Diffuse 


CS:  Illumination 


Uniform 


General 


24  Fund.  Hypotheses 


Diffuse 


CS:  niumination  I  Uniform 


Dielectrics 


General 


CS:  Body  Reflection 


Diffuse 


Uniform 


General 


Figure  27.  A 'nuKonomy  of  ftmdainental  hypotticaes. 


of  (he  dielectric  surfaces  we  are  concerned  with  are  manufactured  materials— paint,  plastic,  glass,  ink,  papcf,  cloth, 
cciainic-using  the  NIR  assumption  to  prune  the  list  of  fondamental  hypotheses  does  not  significantly  alter  the 
e  xplanatory  power  of  this  method.  Furthennore,  if  the  surface  reflection  does  not  admit  all  wavelengths  evenly  to  the 
colorant  particles  that  constitute  the  body  reflection,  foen  the  body  reflection  will  be  colored  as  v  ell,  violating  our  sin¬ 
gle  color  source  constraint  Pruning  these  six  hypotheses  results  in  24  fundamental  hypotheses  of  image  fmmation. 


3.3.2.  Taxonomy  of  Fundamental  Hjrpotheses 

We  can  arrange  these  24  hypotheses  in  a  tree  structure  tia;ording  to  the  material  type,  color  source,  illumination  envi¬ 
ronment.  and  shape  as  shown  in  Hgure  27.  The  first  branching  indicates  the  material,  ot  genera!  transfer  function  of 
the  hypothesis  and  divides  the  24  hypodieses  into  tv'o  subsets.  The  second  branching  indicates  the  color  source  of  the 
hypothesis.  As  the  body  reflection  cannot  be  a  color  source  for  a  metal,  and  the  surface  reflection  cannot  be  a  color 
source  for  a  dielectric,  four  subsets  result  from  this  inarching.  The  third  branching  specifies  the  illumination  environ¬ 
ment  of  the  hypothesis.  IMth  three  possible  illuminatio.:  environments  for  each  category  of  material  and  color  source, 
this  divides  the  24  hypotheses  into  twelve  subsets.  Each  of  these  subsets  has  two  leaves,  not  shown  in  the  tree,  one 
representing  a  hypothnis  with  a  curved  surface,  and  one  with  a  planar  surface. 

The  resulting  tree  v/ith  its  24  leaves  iq>resents  a  taxonomy  of  fundamental,  or  the  simplest  hypotheses,  classifying 
the  different  physical  explanations  for  an  image  region.  Tlw  true  importance  of  this  taxonomy  is  that  it  represents  a 
finite  set  of  simple,  yet  relatively  comprehensive  hypotheses  for  des^bing  an  appearance  patch  corresponding  to  a 
UCR.  Therefore,  we  can  postulate  a  hypothesis  set,  with  a  reasonably  small  number  of  hypothe  ses,  for  each  UCR  we 
identify  in  an  image.  This  provides  an  initial  segmentation  and  sets  the  stage  for  us  to  begin  reasoning  about  and 
merging  hypothesis  regions. 


t9 


4.  Analysis  and  Merging  of  Hypotheses 

In  tlib  section  we  Icrther  analyze  the  fundamental  hypotheses  and  develop  a  set  of  tools  for  c<  mparing  and  merging 
Lhcm.  To  illustrate  these  tools  we  simultaneously  wt»k  through  a  simple  example  image  of  a  lainbertian  sphere  v,ith  a 
stripe  in  the  rrtiddle,  shown  in  Figure  28.  The  goal  of  this  section  is  to  develop  the  outline  of  a  segmentation  algorithm 
using  reasoning  about  the  physics  underlying  fundamental  hypothesis  regions. 


4.1.  Analysis  of  the  Fundamental  Hypotheses 


'fli-e  taxonomy  sliown  in  Figure  27  might  be  taken  to  suggest  that  all  of  the  fundamental  hypotheses  are  of  equal  value 
ia  e.xplmning  a  scene.  We  do  not  believe  this  is  the  case  fw  most  images.  As  a  deeper  analysis  shows,  several  of  the 
fundamental  hypotheses  have  little  relative  value  in  explaining  image  regions.  This  same  aitilysis  also  shows  that 
some  mechanism  will  ha  ve  to  be  proposed  for  the  orderly  development  of  more  complex  hypotheses  to  explain  some 
common  physical  phenomena. 

We  begin  with  a  structured  analysts  of  each  icebtree  of  the  taxonomy,  considering  in  turn  each  of  the  four  possible 
combinations  of  material  and  color  source  and  tlte  six  associated  hypotheses.  The  goal  of  this  examination  is  to  divide 
the  24  hypotheses  into  two  groups,  or  tiers,  corresponding  to  common  and  rare  physical  situatiens.  Common  hypoth¬ 
eses  we  specify  as  belonging  to  tier  one,  and  rare  hypotheses  we  place  in  tier  two. 

V/e  begin  with  tlte  tree  corresponding  to  colored  dielectrics  under  white  illumination.  These  six  hypotheses  are 
grouped  into  three  pairs  according  to  the  illumination  environment  Clearly,  curved  and  planar  dielectrics  under  uni- 
fonn  lighting  form  a  large  subset  of  objects  in  a  typical  scene.  Scenes  that  can  be  modeled  by  these  two  hypotheses 
include  paper,  plastic,  and  painted  objects  under  one  or  more  light  sources  of  approximately  rquivalent  brightness. 
Sunlight  can  also  be  frequently  approximated  by  a  uniform  source  when  considering  dielectrics  because  its  effect  on 
dielectric  surfaces  usually  overwhelms  any  illumination  from  other  directions.  Likewise,  curv  ed  and  planar  dielec¬ 
trics  under  diiTuse  lighting  are  often  used  as  a  model  for  surfaces  in  shadow,  where  no  light  source  is  directly  incident 
on  the  surface  [10], 

Curved  and  planar  dielectrics  under  general  function  white  lighting  are  an  interesting  pair  of  f.ypotheses.  In  the  real 
world,  they  are  probably  the  most  common  hypotheses,  as  uniform  and  diffuse  lighting  are  only  approximatioiis  of 
the  real  world.  In  the  case  of  dielectrics,  however,  uniform  and  diffuse  lighting  models  are  piobahly  sufficient  for 
most  su&faces.  The  major  reason  is  that  dielectrics,  unlike  metals,  have  a  strong  body  reflecdon  component,  they 
reflect  some  of  the  light  from  each  incident  direction  in  each  exitant  direction.  In  the  extreme  case,  a  perfectly  Lam¬ 
bertian  surface  reflects  the  incident  light  from  a  single  direction  equally  in  all  directions.  Practically,  this  means  that 
the  exitant  light  energy  field  due  to  a  strong  incident  light  source  from  a  single  direction  can  overshadow  the  addi¬ 
tional  exitant  light  due  to  the  incident  light  from  all  other  directions.  In  scenes  where  there  are  one  or  most  light 
sources  incident  on  an  object,  therefore,  we  propose  that  most  lighting  conditions  can  be  modeled  as  a  set  of  uniform 
brightness  white  sources,  which  falls  under  the  uniform  iHumination  category.  This  analysis  is  strengthened  by  the 
fact  that  the  ‘general  function’  illumination  in  this  case  must  still  be  uniform  spectrum-black,  while,  or  grey— at  each 
point  on  the  hemisi^re  because  the  color  source  is  the  body  reflection.  This  makes  the  uniform  illumination  category 
un  even  better  ai^roximation  to  the  general  function  category  because  only  the  geometry  is  approximated,  rather  than 
the  spectral  char^teristics  of  the  illumination.  Because  of  this  analysis,  we  propose  that  the  bnmch  of  the  taxonomy 
with  colored  dielectrics  under  white  illumination  has  four  common  hypotheses  which  belong  in  tier  one-those  vith 
diffuse  and  uniform  illumination  environments— and  two  rare  hypotheses  which  belong  in  tier  two— those  with  the 
genera]  function  illumination  environment. 

The  next  branch  corresponds  to  white  dielectrics  under  colored  illumination.  In  common  scenes  we  suggest  that .  itu- 
ations  corresponding  to  these  hypothese.s  are  rare.  The  most  common  occurrence  of  these  is  probably  interreflet  lion 
between  a  colored  object  and  a  white  dielectric  object  such  os  a  white  wall.  In  these  cases,  the  white  object  is  lii  by 
both  a  direct  light  source  and  some  type  of  colored  reflection  from  a  nearby  object.  The  illuminulion  environment  cor¬ 
responding  to  this  case  can  only  be  represented  by  a  general  function  illumination  environment,  as  both  the  direct 


Figure  28.  (Plate  18)  Tliree^olor  lambertian  sphere. 


illumination  and  the  interreflection  are  significant  The  hypotheses  corresponding  to  colored  diflhise  reflection  are  less 
common,  generally  occurring  when  the  white  object  is  in  shadow  from  direct  sources  but  still  e^^perieitces  reflection 
from  a  nearby  colored  object  Colored  uniform  sources-blue  light  bulbs,  for  example-are  not  comtnon  in  human 
environments.  Given  this  analysis,' we  propose  that  the  curved  and  planar  hypotheses  with  general  function  illumina¬ 
tion  be  placed  in  tier  one,  and  the  other  four  hypotheses  in  tier  two. 

White  metals  under  colored  illumination  form  the  next  branch  of  the  taxonomy.  Unlike  dielectrics,  incident  light  from 
almost  all  directions  is  significant  in  the  s^rpearance  of  a  metal  appearance  patch.  This  can  be  seen  in  Figure  1,  where 
inter-reflected  light  that  is  dim  relative  to  the  global  tight  source  still  has  an  effect  on  the  appearance  of  the  metal 
objects.  For  this  reason,  the  hypotheses  with  general  function  illumination  are  the  most  common.  It  is  rate  for  a  metal 
surface  to  be  lit  only  by  colored  uniform  illumination,  or  to  have  the  same  color  and  intensity  light  incident  from  all 
directions  as  under  diffuse  illumination.  Furthermore,  unlike  dielectrics,  diffuse  illumination  environments  arc  not 
good  approximations  because  the  exitant  light  energy  Add  in  a  given  direction  is  dependent  on  only  one  direction  of 
the  incident  light  energy  field.  Therefore,  the  two  hypotheses  with  general  function  illumination  telong  to  the  first 
tier,  and  the  other  four  hypotheses-diffiise  and  uniform  illumination— belong  to  the  second  tier. 

The  final  branch  of  hypotheses  is  the  colored  metals  under  white  illumination.  As  with  grey  metals,  the  hypotheses 
with  general  function  illumination  ate  the  most  corrunon  models  for  colored  metal  objects.  Unfoitunately,  because  of 
the  single  color  source  constraint,  this  general  function  illumination  cannot  also  be  colored,  severely  restricting  the 
set  of  objects  these  hypotheses  can  model.  In  fact,  we  propose  that  uniform  illumination  is  sufficient  to  model  any 
surfaces  that  would  correspond  to  colored  metal  under  white  illumination.  Diffuse  illumination,  as  with  grey  metals, 
we  believe  is  rare.  From  this  analysis,  the  two  hypotheses  with  uniform  illumination  belong  in  tier  one;  the  other  four 
belong  in  tier  two. 

It  is  the  analysis  of  colored  metals  that  most  clearly  demonstrates  the  need  for  a  method  to  incorporate  mote  complex 
hypotheses  into  the  reasoning  process.  Our  definition  of  “fundamental  hypotheses”  stipulating  a  single  coltH*  source 
for  a  UCR  will  not  be  ^equate  to  explain  many  images  of  colored  metals,  because  their  appearance  will  also  depend 
on  colored  interreflection  from  nearby  objects.  The  other  area  where  more  complex  hypotheses  are  needed  is  for 
interreflection  between  colored  objects,  especially  dielectrics.  In  the  example  we  work  through  herein,  these  prob¬ 
lems  do  not  arise.  However,  we  will  ultimately  nc«d  a  mechanism  for  infusing  mote  complex  hypotheses— for  exam¬ 
ple,  red  metals  under  colored  illumination-in  order  to  achieve  the  generality  we  desire  in  this  segmentation  method. 

Tlie  overall  result  of  this  analysis  is  that  there  are  ten  common  fundamental  hypotheses  in  tier  one,  and  fourteen  less 
common  or  rare  fundamental  hypotheses  in  tier  two.  figure  30  through  Figure  39  illustrate  the  ten  fundimental 
hypotheses  in  tier  one. 

In  our  example  segmentation,  we  consider  only  the  fundamental  hypotheses  in  tier  one.  As  shown  in  Figure  29,  the 
two-color  sphere  divides  into  three  UCRs:  top,  middle,  and  bottom.  To  each  region  we  can  attach  the  list  of  ten 
hypotheses  from  tier  one,  forming  three  hypothesis  sets  of  ten  hypotheses  each. 


{•ifiire  a.  (.mate  ivj  t'liree  uypiMliesis  sets  in  tbe  example  nnage. 

4^  Merigiiig  Hypothesis  Regions 

Each  of  the  UQts  in  our  example  image  has  a  hypothesis  sets  with  10  fundamental  hypotheses  explaining  its  physics 
of  fotmation.  We  seek  to  agglomerate  small  regions  into  big  ones  in  order  to  search  for  coherence  between  regions. 
Our  bask  m^hod  is  to  take  two  adjacent  hypothesis  sets  ffS,  =  (Fj.ffjj, /fj2, ...).  and 
HS2  »  </*2i  ^21*  ^22*  ^  hypothesis  set  HS^  -  (P2  ^  f^2>  ^3l>  •••)•  1"  which  the  hypotheses 

are  created  by  merging  compatible  hypotheses  and  ff2k' 

A  bulldozer  i^roach  would  consido'  all  possible  combinations  of  the  fundamental  hypotheses,  resulting  in  10^ 
aggregate  hypotheses.  But  are  time  really  1000  plausible  explanations  for  this  combination  of  three  regions?  Such  a 
merging  method  is  not  only  unreasonable,  but  also  too  expensive  to  use  even  on  simple  images  because  of  the  expo¬ 
nential  explosion  of  the  numbo'  of  hypotheses.  The  interaction  between  hypothesis  regions  and  the  nature  of  the 
physical  explanations  must  provide  a  guide  or  constraint  to  limit  diis  explosion. 

Fortunately,  the  goal  of  the  segmentation  process  provides  a  partial  solution.  Tbe  mergers  in  which  we  are  interested 
during  segmentation  involve  coherence  in  the  general  variates:  material  type,  shape,  color  source,  and  illumination 
environment  When  two  hypotheses  match  in  several  or  all  of  these  four  variables,  but  differ  in  color  or  other  subfea¬ 
ture,  it  makes  sense  to  combine  them  into  a  single  region.  It  does  not  make  sense  to  combine  two  hypotheses  that  pro¬ 
pose  dififerent  materials  at  this  stage  of  the  inuige  analysis.  Nm  does  it  make  sense  to  combine  a  hypothesis  proposing 
the  surface  reflection  as  the  color  source  wifli  a  neighboring  hypothesis  that  prtqioses  the  body  reflection  as  the  color 
source.  While  such  a  merger  may  make  sense  on  a  more  abstract  scale-consider  a  watch  with  a  painted  face  and 
metal  watchband— it  does  not  make  sense  in  a  low-level  segmentation.  Likewise,  at  this  level  of  segmentation  we  pro¬ 
pose  that  two  hypotheses  of  differing  shape  should  not  be  merged. 

On  tbe  other  hand,  it  is  possible  that  two  hypotheses  with  differing  illumination  environments  should  be  combined.  A 
common  example  of  this  is  an  object  partly  in  shadow.  One  hypothesis  for  tbe  surface  not  in  shadow  could  have  uni¬ 
form  illumination,  while  one  hypothesis  for  the  region  in  sha^w  may  be  diffuse  iliumination.  Combining  these  two 
hypotheses  is  desirable  if  the  surface  shi^  and  matoial  m^h.  The  resulting  hypothesis  would  have  a  general  func¬ 
tion  illumination  environment,  albeit  with  recognizable  structure. 

The  constraints  requiring  that  mergeabie  hypothesis  pairs  must  have  the  same  material  type  and  color  source  sharply 
curtail  the  numbor  of  resulting  explanations.  The  chart  in  Figure  40  shows  all  possible  combinations  of  the  fundamen¬ 
tal  hypotheses  of  two  regions  for  the  ten  hypotheses  in  tio-  one.  As  it  shows,  tweive  hypotheses  result  from  merging 
the  two  hypothesis  sets  containing -the  ten  hypotheses  from  tier  one.  The  explicit  rules  we  use  to  obtain  these  twelve 
hypotheses  are: 

•  Hypotheses  of  differing  nuiterials  should  not  be  merged. 

•  Hypotheses  of  differing  color  sources  should  not  be  merged. 

•  Hypodieses  of  differing  sluqie  should  not  be  merged. 


figare  34.  (Plate  24)  Hypothcris  5:  pbuiar~seiicral 

meCaL 


Figure  35.  (Plate  25)  Hypothesis  6:  curved-general 
OL-grey  metaL 


Figure  3C.  (Plate  25)  B^ypotficsis  7:  planar- 


Figure  37.  (Plate  27)  Hypothcsb  8:  curved- 
uniform  ilL-c^red  metaL 


Flgnre  38.  (Pfade  28)  Hypothecs  9:  planar-general 
BL-grqr  or  white  dhtectric 


Figure  39.  (Plate  29)  Hypothesis  10:  curved- 
general  iil.-gr^  or  white  dielectric. 


5 


dUhue 

coLdleL 

niiifenn 

planar 

cmred 

phmar 

curved 

greydkL 

plaur 

gcnmlilL 

curved 

coL  metal 

uniftMiB  OL 

curved 

grey  metal 

planar 

general  BL 

curved 

:  desired  mager 


Figure  40.  PoadMe  mwgers  of  the  ten  *bcst*  ftindamental  hypotheses  for  two  regions  of  differing  color.  The 

grey  squares  indicate  Uie  dcdrable  mergers. 

•  *X}ol(ned  metal”  hypotheses  of  differing  chimnaticity  and  similar  Uluminadmi  should  not  be  merged. 

•  If  the  hypotheses  differ  in  their  chromaticity  and  the  iiiumiiution  is  the  color  source,  then  hypotheses  with 
diffuse  illumination  environments  should  not  be  meqed. 


The  reasoning  bdiind  the  first  three  rules  should  be  clear,  we  do  not  want  to  propose  abstract  relationships  between 
image  regions  at  this  low-level  stage  of  segmentation.  The  fourdi  rule  results  ^m  the  fact  that  the  surface  reflection, 
or  material  prrqmties  of  the  surface,  determine  the  color  of  “red  metal”  hypotheses.  Therefore,  if  two  of  these 
hypothesis  regions  differ  in  color  but  have  the  same  illumination  environment,  they  must  be  different  materials.  As 
they  are  different  materials,  they  should  not  be  merged. 


The  last  rute  is  due  to  the  [riiysics  of  illumination.  Diffuse  illumination  specifies  that  the  color  and  intensity  of  the  illu¬ 
mination  are  constant  over  the  illumination  hemi^>here.  Now  consider  two  adjacent  appearance  patches  with  the  illu¬ 
mination  as  die  coIot  source.  If  the  illumiiuidon  is  diffuse,  and  the  adjacent  patches  are  at  less  than  a  180°  angle,  thoe 
will  be  overlap  between  the  illumination  environments  of  the  two  patches.  If  die  two  patches  are  differing  colors,  this 
situation  is  inqiossible  unless  the  illumin^on  is  such  that  each  point  on  the  illuminadon  hemisphere  appears  one 
ariw  from  one  qipearance  patch  and  a  different  color  from  the  adjacent  appearance  patch.  Such  an  illumination  envi- 
rmiment  is  unlikely  at  best  and  is  reasonably  discarded. 

Returning  to  our  exanqile,  m«ging  the  top  and  middle  regions,  and  the  middle  and  bottom  regions,  we  obtain  twelve 
possible  hypotheses  for  the  meigm*  of  each  pair.  As  the  hypotheses  for  the  middle  region  can  be  matched  for  each  pair. 


patf24 


dMie  are,  in  fact,  twenty  resulting  hypMheses  for  the  entire  sphere.  These  twenty  hypotheses  are  listed  in  Table  1 . 


Tsble  1.  Final  set  hypotheses  for  the  example  image 


Hypothesis 

IbpR^kn 

Middle  Region 

Bottom  Region 

ITIerl 

Diel/CS>BR/UiiiA::urved 

Diel/CSsBR/UniA:Urved 

Diel/CS=BRAJniyCurved 

2Tier2 

Diel/CS>BR/Dif/Curved 

Diei/CSsBR/DifiCiirved 

DielA:S=BRAHf7Curved 

3 

Diel/CSsBR/Uni/Plaiur 

Diel/CSsBRAJni/Planar 

Diel/CS=BRAJni/PIanar 

4 

DieVCSsBR/DifTPlanar 

IHel/CS-BR/DifTPIanar 

Diel/CSsBR/DifTPIanar 

5 

MetalA:SaIL/g£fCurved 

MetaVCSsR/gOCurved 

Meta]/CS=IUg&Curved 

6 

Metal/CS^DL/gOnanar 

Metal/CSsjL/gOPiaiiar 

Meta]/CS=:lL/g&Planar 

7Tler3 

Diel/CSssIL/gOCutved 

Diel/CS»lL/g£ICurved 

Diel/CS=lUg£fCurved 

8 

Diel/CSslL/g&Planar 

Diel/CSsIL/gOPlanar 

Diel/CS=lL/g&PIanar 

9Tier4 

Die]/CS»BRAJniiCurved 

DielA:SsBR/UniA::iirved 

Diel/CS=BR/Dif&Curved 

10 

Diel/CSxsBRAJniXIUrved 

DieMCSsBR/DifJCurved 

Diel/C&rBR/Um'yCurved 

11 

Diel/CSsBRAJniXIurved 

Diel/CSsBR/DifTCurved 

Diel/CS=BR/Dif7Curved 

12 

DielA^SsBR/DifiCurved 

Diel/CSsBR/UniyCurved 

Die]/CS=cBR/Uni7Curved 

13 

Diel/CSsBR/DifTCurved 

Diel/CSsBR/UniTCurved 

Diel/CS=BRyDif&Curved 

14 

Diel/CS-BR/DifXXnved 

DieWCS»BR/DifJCurved 

Diel/CSsBRAJniXhirved 

IS 

Diel/CSsBRAJniTPIanar 

Diel/CSsBRAJniTPIanar 

Dieiyc&sBR/DifCnPlanar 

16 

Diel/CSsBRAJniTPlanar 

DielA;SsBR/Dif7Planar 

Diel/CS=BRAJni7Planar 

17 

Diel/CSsBRAJniTPIanar 

Diel/CSsBR/DifTPIanar 

Die]/CS=BR/Dif7P]anar 

18 

Diel/CS»BR/Dif7nanar 

DieVCS=BRAJni7nanar 

Die]/CS=BRAJni7Planar 

19 

Die]/CS=BR/Dif7nanar 

DieVCSsBR/UniTPlanar 

Diel/CS=BR/Dift^lanar 

20 

Diel/CS=BR/Dif7PIanar 

Diel/CS=BR/DifyPlanar 

Diel/CS=BR/Uni7Planar 

43.  Ranking  Hypotheses 

At  this  point  in  the  segmentation,  we  use  our  postulate  that  the  simplest  explanation  is  the  best  explanation  for  a 
hypothesis  set.  As  noted  previously,  because  each  broad  hypothesis  can  provide  a  good  approximation  to  the  data,  to 
implement  the  MDL  pinciple  we  rank  order  the  hypodieses  into  classes,  or  tiers,  according  to  their  relative  simplicity 
in  explaining  the  combined  image  regions.  It  is  important  to  realize  this  stage  of  the  segmentation  process  is  depen¬ 
dent  upon  the  specific  region  being  described.  For  example,  a  region  of  uniform  pixel  values,  can  easily  be  described 
by  a  region  of  homogeneous  color  under  a  diffuse  illumination  environment.  Regions  such  as  fiiose  in  our  example, 
howevCT,  require  a  surface  of  non-homogeneous  color  if  the  illumination  environment  is  diffuse.  By  using  vision 
tools  such  as  isobrightness  contours,  color  histograms  [21],  and  normalized  color  [14],  as  well  as  reasoning  about  the 
possiUe  realizations  given  specific  regions  and  hypotheses,  we  believe  it  is  possible  to  rank-order  the  resulting 
moged  hypotheses. 

We  finish  our  example  by  rank-ordering  the  final  twenty  hypotheses  for  the  example  image.  We  realize  that  we  are 


pt^2S 


tt^ng  scMoe  human  leasoning  in  this  process,  but  it  is  the  first  stq>  towards  developing  a  more  rigorous,  computable 
process. 

Clearly,  the  simplest  explanatitm  for  this  scene  is  hypodiesis  1,  proposing  that  each  region  is  a  colored  dielectric 
under  uniform  iilumination.  This  hypothesis,  the  first  hypothesis  in  Tfole  1,  belongs  by  itself  in  the  first  tier.  We  pro¬ 
pose  tius  hypothesis  as  the  simptest  ex{danation  because  a  realization  exists  where  each  element  of  the  hypotheses  for 
three  sub-regions  is  both  homogeneous  and  simple.  Because  of  the  homogeneity  and  simplicity,  we  can  specify  the 
scene  with  a  small  number  of  parameters  and  recreate  it  exactly.  These  parameters  include  the  b<^y  refiection  color  of 
each  r^ion  of  the  qAiete,  the  radius  of  the  sphere,  the  portion  of  the  sphere,  the  position  of  the  li^t  source,  the  color 
of  the  light  source,  and  die  parameters  of  the  roughness  model-e.g.  Crok-Torrance.  For  no  other  hypodiesis  in  Table 
1  is  there  as  compact  a  realizadon. 

Tier  two  contains  hypotheses  9  and  16,  cotreqionding  to  a  colored  planar  dielectric  uncter  white  uniform  or  diffuse 
illumination,  hypodiesis  8,  ctme^xMiding  to  a  painted  ball  under  diffuse  illumination,  and  hypotheses  17 and  ISspec- 
ifying  diat  the  image  is  planar  or  curved  white  metal  reflecting  colored  light  The  first  two  hypotheses  propose  that  we 
«e  looking  at  an  image  of  a  picture.  The  grey  metal  hypotheses  i»opose  duU  we  are  looking  at  the  reflection  of  an 
object  in  a  minor.  We  place  dasse  hypotheses  in  tiCT  two  because  each  of  them  puts  all  of  the  complexity  fin*  the  image 
into  a  single  element  of  the  description,  and  the  otho’  elements  of  the  hypotheses  are  simple  and  homogeneous  ovo’ 
all  dnee  regions.  Furthermore,  each  of  these  hypotheses  for  the  image  are  realizable  withwt  the  use  of  strange  light 
sources  (S' careful  setup. 

In  tiCT  diree  we  place  hypotheses  19  and  20,  corresponding  to  white  d**^  wirics  under  coltued  illumination.  One  real¬ 
ization  of  these  hypotheses  is  a  white  object  on  which  the  scene  is  bei '  g  projected.  Note  that  the  use  of  either  active 
critical  elements-a  lens  in  a  projector-w  careful  positioning  and  si  ,  ~^ni,ig  of  the  light  sources  may  be  necessary  to 
recreate  these  hypotheses  in  a  lab.  Nevertheless,  the  mnaining  e]eni<^.its  of  the  hypothesis  are  simple  and  homoge¬ 
neous  fiM'  all  dum  sub-regions,  which  differentiates  these  hypotheses  from  the  remaining  ones  in  tier  four. 

Tier  four  contains  the  remaining  12  hypotheses,  each  of  whidi  is  some  combination  of  colcwed  dielectrics  undo’  uni¬ 
form  or  diffuse  illumination.  In  none  of  these  hypotheses  are  all  of  the  regions  homogeneous  in  their  simple  elements. 
This  differentiates  them  from  the  first  three  tiers  and  make  their  physical  realization  more  complex,  w  ‘weird.’ 

What  this  analysis  provides  for  our  example  image  is  a  set  of  suggested  segmentations.  Furthermore,  these  segmenta- 
tkms  are  rank-mdered,  giving  a  higher  level  program  a  sense  of  which  are  the  ‘best’  segmentations  of  the  image. 
While  the  criteria  and  reasoning  used  to  rank-ordo'  the  segmentations  are  not  rigorous  enough  in  this  formulation  to 
allow  a  computer  to  simulate  these  results,  we  believe  this  method  is  asking  the  right  questions  and  laying  the  founda¬ 
tion  for  a  rigorous  segmentation  algmithm. 

5.  Conclusions 

What  we  have  presented  hoein  is  an  abstract  analysis  of  the  problems  and  methods  involved  in  segmentation  of  gen¬ 
eral  color  images.  To  suf^rt  this  analysis,  we  presented  a  general  model  and  nomenclature  describing  the  physics  of 
image  formation.  We  have  also  provided  a  rough  example  of  our  segmentation  framework,  demonstrating  tiie  major 
themes  and  ideas. 

We  have  not  presented  an  implementation  based  upon  our  analysis.  Implementation  of  even  subsections  of  this 
medrad  will  be  a  large  undertaking.  The  work  by  Breton  et  al.  [S],  for  example,  demonstrates  the  type  of  reasoning 
and  algorithms  necessary  for  each  of  the  fundamental  hypotheses.  Their  method,  which  analyzes  large  sets  of  surface 
shrqies  and  lighting  positions  for  a  Lambertian  surface,  fits  completely  within  a  single  fundamental  hypothesis.  As 
demonstrated  by  the  taxonomy  of  fundamental  hypotheses,  there  are  some  areas  in  which  very  little  research  has  been 
undertaken  to  date.  Describing  this  segmentation  method  in  a  computable  fashion,  will  require  integrating  numerous 
techniques  fipom  divose  areas  of  physics  based  vision. 


pagt26 


Tbe  value  of  our  analysis  is  that  fw  the  first  tiine  we  are  examining  where  in  the  general  segmentation  process  it  is 
i^)prq[>riate  to  ^>piy  specific  physics-based  vision  techniques  and  how  to  integrate  them  into  a  whole.  We  have  also 
explicitly  identified  some  of  the  difficulties  inherent  in  integrating  and  reasoning  about  the  physics  of  image  forma¬ 
tion.  Our  analysis  is  both  a  basis  and  set  of  guidelines  fm  future  research  towards  the  development  of  an  integrated 
segmentafion  syston. 

The  analysis  of  Rissanen  [32]  in  his  discussion  of  the  selection  of  model  classes  also  provides  a  medradology  for  our 
analysis,  especially  our  selection  of  fundamental  hypotheses,  or  model  classes.  The  process  of  choosing  tlte  funda¬ 
mental  hypotheses  is  one  of  selecting  an  initial  set  of  model  families  with  which  to  analyze  the  image  regions.'Ris- 
sanen  aigues  there  is  no  algorithmic  m^hod  for  undotaking  this  task,  and  that  human  intuition  is  indispensable.  What 
we  have  provided  herein  is  a  structured  analysis  of  the  segntentadon  problem  that  suggests  a  relatively  small,  justifi¬ 
able  set  of  models  for  die  physics  of  a  scene. 

The  potential  of  an  integrated  segmentation  system  based  on  a  general  model  of  the  physics  of  image  formation  is  tre¬ 
mendous.  Because  it  would  rely  iqwn  physical  models,  a  proposed  segmentadon  becomes  not  just  a  set  of  regions, 
but  a  physical  explanadon  for  every  pixel  in  the  image  as  well  as  how  those  explanadons  relate  in  the  3-D  world  with 
regard  to  shiqie,  transfer  fiincdon,  and  illuminadon.  By  ccmsidering  muldple  hypodteses  for  image  regions,  it  should 
be  able  to  provide  muldple  segmentadons  of  the  entire  inu^e,  reflecdng  in  a  structured  manner  the  ambiguity  that  is 
present  in  die  mapping  fiom  an  image  to  the  real  world.  Finally,  because  the  physical  models  are  general  enough  to 
capture  virtually  any  illuminadon  environment,  transfer  fiincdon,  or  surface  shape,  this  segmentadcm  method  has  the 
potential  to  wcxic  on  a  ivide  range  of  images  without  prior  knowledge  of  the  scene. 


Acknowledgements: 

We  would  like  to  thank  Glenn  Healey,  Lawrence  Wolff,  and  John  Knimm  for 
their  fruitful  and  insightful  discussions  of  the  issues  raised  in  this  papa-. 


[1]  E.  Adelson  and  J.  Befgen,  ‘The  Plenoptic  Function  and  the  Elements  of  Early  Vision  ”  in  Computational 

Models  ef  Visual  Processing,  ed.  M.  S.  Landy,  and  J.  A.  Movshon,  Cambridge,  NOT  Press,  1991. 

[2]  R.  Bajcsy,  S.  W.  Lee,  and  A.  Leonardis,  ‘X^or  image  segmentation  with  (Section  of  highlights  and  local 

illumination  induced  by  inter-r^ection,**  in  Proc.  International  Conference  on  Pattern 
Recognitkm,  Atlantic  City,  NJ,  pp.78S-790, 1990. 

[3]  P.  Beckmann,  and  A.  SfMzeochino,  The  Scattering  Electromagnetic  Waves  from  Rough  Surfaces, 

Norwood,  Aitech  House,  1987. 

[4]  M.  Bom  and  E  W[tif,  Principles  of  Optics,  Pngamon  Press,  London,  196S. 

[5]  P.  Breton,  L.  A.  Iverson,  M.  S.  Lango',  S.  W.  Zuck»,  Shading  Flows  and  Scenel  Bundles:  A  New  Approach 

to  Shape  from  Shading,  TR-CIM-91-9,  NfcGill  University,  Nov.  1991. 

[6]  C.  R.  Brice  and  C.  L.  Fenema,  “Scene  analysis  using  regions,”  Arti^ida/  Intelligence  1, 205-226, 1970. 

[7]  M.  H.  Brill,  Tmage  Segmentation  by  Object  Color.  A  Unifying  framework  and  Connection  to  Colw 

Constancy,”  Journal  of  the  Optical  society  of  America  A  7(10),  {^>.2041-2047, 1990. 

[8]  M.  H.  Brill,  “Photomdric  naodels  in  multispectral  machine  vision,”  in  Physics-Based  Vision:  Principles  and 

Practice,  Color,  ed.  G.  Healey,  S.  Shafer,  and  L.  Wcdff,  Boston,  Jones  &  Bartlett  Publish^  1992. 

[9]  M .  Cohen  and  D.  Greenberg,  *The  hemi-cube:  a  radiosity  sdution  for  complex  environments,”  Computer 

Graphks  Proc.  ofSIGGRAPH-85, 1^.31-40, 1985. 

[10]  R.L.  Cook  and  K.E.TcHTance,  “A  Reflectance  Model  fcMrCmnputer  Graphics,”  Computer  Graphics  15(3), 

pp.307-316, 1981. 

[11]  T.  Darrell,  S.  Sclaroflf,  and  A.  Pentland,  “Segmentation  by  Minimal  Description,”  in  Proceedings  of 

Intematioruti  Cotrfetmce  on  Computer  Vision,  IEEE,  [^.112-116, 1990. 

[12]  J.  D.  Fbley,  A.  van  Dam,  S.  K.  Feiner,  J.  F.  Hughes,  Computer  Graphics:  Principles  and  Practice,  2nd 

edition,  Addison  Wesley,  Reading,  MA,  1990. 

[13]  X.  D.  He,  K.  E.  Torrance,  F.  X.  Sillion,  and  D.  P.  Greenberg,  “A  Comprehensive  Physical  Model  for  Light 

Reflection,”  Computer  Graphics  25(4),  i^'i  75-186, 1991. 

[14]  G.  Healey,  “Using  color  for  geometry-insensitive  segmentation,”  Journal  of  the  Optical  Society  of  America 

A  6(6),  pp.920-937,  June  1989. 

[15]  P.  Hamahan  and  W.  Krueger,  “Reflection  from  Layered  Surfaces  due  to  Subsurface  Scattering,”  Computer 

Grapitics  Proc.  ofSIGGRAPH-93,ipp\6S-nA,  1993. 

[16]  B.  K.  P.  Horn,  Robot  itision,  Cambridge,  MTT  Press,  1986. 

[17]  R.  S.  Hunto’,  The  Meamrement  of  Appearance,  Jdin  Wiley  and  Sons,  New  Yoric,  1975. 

[18]  D.  B.  Judd  and  G.  Wy5ze)::ki,  Color  in  Business,  Science,  and  Industry,  3rd  ed.,  John  l^ley  and  Sons,  New 

York,  1975. 

[19]  J.  Kaufrran  and  J.  Christensen,  lES  Lighting  Ready  R^erence,  IHuminating  Engineering  Society  of  North 

America,  1985. 

[20]  G.  J.  Klinker,  S.  A.  Shafer  and  T.  Kanade,  “Using  a  Color  Reflection  Model  to  Separate  Highlights  from 

Object  Color,”  in  ProceerUngs  of  International  Cortference  on  Computer  Vision,  IEEE,  New  Yrak, 
pp.  145-150,  June  1987. 

[21]  G.  J.  Klinker,  S.  A.  Shafn'  and  T.  Kanade,  “A  Physical  approach  to  color  image  understanding,” 

Interruttioruil  Journal  of  Computer  Vision  4(1),  {^.7-38, 1990. 

[22]  J.  Krunun  and  S.  A.  Shafer,  “Segmenting  Textured  3D  surfaces  Using  the  Space/Frequency  Representation,” 
to  be  printed  in  Spatial  Vision,  1994. 

Y.  G.  Leclerc,  “Constructing  Simple  Stable  Descriptions  for  Image  Partitioning,”  Intematioruti  Journal  of 
Computer  Vision,  3, 73-102, 1989. 


[23] 


H.*C.  Lee.  ‘^letbod  for  Computing  the  Scene-IUuminant  Chrometicity  from  l^wcular  Highlights,**  Journal 
ofAe  Optical  Society  of  America  A  3(10),  pp.1694-1699, 1986. 

H.-C.  Lee,  E.  J.  Bieaeman,  and  C.  P.  Schulte,  *^4odeling  light  reflection  for  color  con^mter  vision,**  IEEE 
Trtms.  OR  Pattern  AmUysis  and  Machine  Intelligence  PAMI-12(4),pp.402*409,  April  1990. 

A.  Leonaidis.  Image  Analysis  Using  Parametric  Models:  Model-Recovery  and  Model-Selection  Paradigm, 
PhD.  Thesis,  LBV-93-3.  University  of  Ljubljana.  Match  1993. 

A.  Leonaidis,  A.  Oiqtta,  and  R.  B^jcsy,  ‘‘Segmentation  as  the  Search  for  the  Best  Description  of  the  Image  in 
Ihrms  of  Primitives,**  in  Proceedings  of  httemattonal  Cot^rence  on  Computer  \Asion,  IEEE, 
PP.121-12S.  1990. 

P.  H.  Moon  and  D.  E.  Spencer,  The  Photic  Field,  Cambridge,  MIT  Press,  1981. 

S.  Nayar,  K.  Ikeuchi,  and  T.  Kanade,  Surface  Ejection:  Physical  and  Geometrical  Perspectives,  C!MU-RI- 
TR-89-7.  Robotics  Institute,  Carnegie  Mellon  University,  1989. 

FD.  hticodemus,  J.  C.  Richmond,  J.  J.  Hsia,  L  W.  Ginsheig,  and  T.  Limperis,  Ceometriced  Considenttions 
and  Nomenclature  for  Reflectance,  National  Bureau  of  Standards  NBS  Monograph  160,  (X:t  1977. 

M.  Oren  and  S.  K.  Nayar,  Generalisadon  of  the  Lambertian  Model  and  Implicatiotu  for  Machine  Vision, 
Columbia  University,  Computer  Science  TR  CTJCS-0S7-92. 

J.  Rissanen,  Stochastic  Complexity  in  Statistical  Inquiry,  Singapore,  Wmld  Scientific  Publishing  Co.  Pte. 
Ltd.,  1989. 

S.  A.  Shafer,  ‘TJsing  (3olor  to  Separate  Reflection  Components,**  COLOR  research  and  application.  10, 
pp.210-218, 1985. 

R.  Stone  and  S.  Shafer,  “The  Detomination  of  Surface  Roughness  from  Reflected  Step  Edges,**  Submitted 

toJOSA,  1993. 

J.  M.  Tenenbaum,  and  H.  G.  Barrow,  Experiments  in  Interpretation-Guided  Segmentation,  Artificial 
Intelligence  Center,  Stanford  Research  Institute  Technical  Note  123,  March  1976. 

J.  M.  Tenenbaum,  M.  A.  Inschler,  and  H.  G.  Barrow,  “Scene  Modeling:  A  Structural  Basis  for  Image 

Description,**  in  Image  Modeling,  ed.  Azriel  Rosenfeld,  New  York,  Academic  Press,  1981. 

S.  Tominaga  and  B.  A.  Wandell,  “Standard  surface-re&ctance  irmdel  and  illtiminant  estimation,**  Journal  <f 

the  Optical  Society  of  America  A  6(4),  pp.576-S84,  April  1989. 

K.  Torrance  and  E.  Sparrow,  “Theory  for  Off-Specular  Reflection  frmn  Roughened  Surfaces,**  in  Journal  of 

the  Optical  society  of  America,  57,  (^>.1105-1114, 1%7. 

L.  B.  Wrdff,  A  Diffuse  Reflectance  Model  for  Dielectric  Surfaces,  The  Johns  Hopkins  University,  Cmnputer 

Science  TR  92-04,  April  1992.  (Submitted  to  JOSA) 

L.  B.  Wolff  and  T.  E.  Boult,  ‘l\>larization/Radioinetric  Based  Material  Gassification,**  in  Proceedings  of  the 
International  Corrference  on  Computer  Vision  and  Pattern  Recognition,  pp.387-395,  IEEE,  San 
Diego.  CA.  1989. 

Y.  Yakimovsky  and  J.  Feldman,  “A  semantics-based  decision  theory  region  analyzer,**  in  Proceedings  3rd 
Irrtematiorud  Joint  Corference  on  Art^ial  Intelligence,  1973,  p.  580-588. 


Plirtt  1  Aa  oIMmI,  «  arirror  !■•••  of  the  ot(}cct,  and  ■  pictare  of  the  object 


PkMe  S.  MunifanHoii  cavkowMat  for  latet 
taut*:  ortiMfoaal  auppNt  of  *  wUfo  INikt 
■Quroe  dkactly  ovcrkcad. 


Plate  <.  Mae  anbieatBght  with  a  wkite  Plate  7.  Grey  aariileBt  li^t  with  red  ii^t 

drcaforeourca  to  the  right  aadbehiad.  reflected  off  aaothcr  object 


Plate  8.  IDoetratlon  of  the  transfer  fonction  for 
a  slightly  rough  metal  object 


Plate  9.  IDustration  of  the  transfer  fonction  for 
a  slightly  rough  idastk  object 


Plate  12.  Mag  (Hvidcd  failo  MeaHzed  antform 
chrooiaiidty  rcgkMM. 


Plate  13.  Dllliise  illuinination  environinenL 


Ptate  17.  FundjimcBtel  hypotlMiia  wtth  BhuBiaatioa  as  color  source:  (a)  surface,  (b)  iOominadon 

envfanonnient,  (c)  transfer  ftinctlon. 


r  1 

L  J 


Plate  18.  Three-color  Lsunbertian  sphere. 


Plate  19.  Three  hypotheste  sets  In  the  example  image. 


Plate  22.  Hypothesis  3:  planar-unlCDmi  ilL-<olored 
dkkctric 


Plate  23.  Hypothesis  4:  curved-uniform  OL-coiored 
dielectric 


Plate  24.  Hypothesis  5:  planar-general  ilL-grey 
metal 


Plate  25.  Hypothesis  6:  curved-general  QL-grey 
metal 


Plate  26.  Hypodieds  7:  planar-uniform  iH-colored 
metal 


Rate  27.  Hypothesis  8:  curved-uniform  i]l.~colored 
metal 


Plate  28.  Hypothesis  9:  planar-general  ilL-grey 
or  white  dielectric 


Plate  29.  Hypothesis  10:  curved-general  ilL-grey 
or  white  dielectric 


